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ABSTRACT 



The present invention concerns improved motion estimation 
in signal records. A method for estimating motion between 
one reference image and each frame in a sequence of frames, 
each frame consisting of a plurality of samples of an input 
signal comprises the steps of: transforming the estimated 
motion fields into a motion matrix, wherein each row 
corresponds to one frame, and each row contains each 
component of motion vector for each element of the refer- 
ence image; performing a Principal Component Analysis of 
the motion matrix, thereby obtaining a motion score matrix 
consisting of a plurality of column vectors called motion 
score vectors and a motion loading matrix consisting of a 
plurality of row vectors called motion loading vectors, such 
that each motion score vector corresponds to one element for 
each frame, such that each element of each motion loading 
vector corresponds to one element of the reference image, 
such that one column of said motion score matrix and one 
motion loading vector together constitute a factor, and such 
that the number of factors is lower than or equal to the 
number of said frames; wherein the results from the Prin- 
cipal Component Analysis on the motion matrix are used to 
influence further estimation of motion from the reference 
image to one or more of the frames. 

43 Claims, 6 Drawing Sheets 
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Fig. 5 



.3, 



If 



510 



560 

J- 









EstMovSeq 

/ ii' ' 








Mov 



520 



530 

J 



Reference 
image model 



EstModel 

~1 
590 



Hyp 



EstHypSeq 

1 

550 



540 



SOP 



— 580 



12/10/2003, EAST Version: 1.4.1 



U.S. Patent Dec 5, 2000 Sheet 3 of 6 6,157,677 



Fig. 6 
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Fig. 7 
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Fig. 8 
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METHOD AND APPARATUS FOR 
COORDINATION OF MOTION 
DETERMINATION OVER MULTIPLE 
FRAMES 

RELATED APPLICATIONS 

The application is related to the following applications 
assigned to the same applicant as the present invention and 
filed on even date herewith, the disclosure of which is 
hereby incorporated by reference: 

Method and apparatus for multi-frame based segmenta- 
tion of date streams, U.S. Ser. No. 08/930,341. 
Method and apparatus for depth modelling and providing 
depth information of moving objects U.S. Ser. No. 
08/930339. 

FIELD OF THE INVENTION 

This invention relates generally to the parameterization of 
each of a set of large, related data records. More specifically, 
it concerns improved motion estimation in sets of related 
signal records, e.g. video frames. 

BACKGROUND OF THE INVENTION 

Within video modelling and compression, motion estima- 
tion and motion compensation is important. Without it 
moving objects and other motions are difficult to describe 
efficiently in applications like video compression and inter- 
active video games. Singh Ajit (1991, Optical Flow Com- 
putation. IEEE Computer Society Press) describes general 
methods for motion estimation. 

Motion estimation is usually done from one frame to 
another frame, say, from a frame m to a frame n, for whose 
intensities we use the term I m and I M . 

When good statistical precision and accuracy of the 
motion estimation is required, it is important to use all the 
available information efficiently in the motion estimation. 
This means that if moving physical objects or phenomena 
are repeatedly observed in several frames, increased preci- 
sion and accuracy may be attained if the motion estimation 
is coordinated between these repeated observation frames. 

However, motion estimation is normally a computation- 
ally demanding operation, in particular when full motion 
fields, with one vertical and horizontal motion parameter for 
each individual pixel in l m or I„ are to be determined. 

Motion estimation can also be very memory demanding. 
Any simultaneous motion estimation for many frames is 
bound to be exceptionally demanding. 

On the other hand, full motion field estimation on the 
basis of only two individual frames is underdetermined: For 
many pixels, an equally good fit can be found with a number 
of different motion estimates, although only one of these 
corresponds to the original physical movement of the objects 
imaged. 

When physical objects can be observed to move system- 
atically over several frames, their motions are generally such 
that if their true two-dimensional (2D) motion fields had 
been known, these would have systematic similarities from 
frame to frame. Due to these systematic similarities, the 
motion fields of a number of related frames could theoreti- 
cally be modelled with relatively few independent param- 
eters. This modelling would in turn have led to very efficient 
compression and editing techniques for video. 

However, in practice, the true motion fields cannot be 
determined from empirical data. First of all there will be 
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more or less random errors in the obtained motion fields due 
to more or less random noise contributions in the raw data. 
Worse, due to the underdetermined nature of full motion 
estimation the probability of finding the 'true* motion field 

5 is low. A different set of spurious false motion estimates may 
be chosen for each frame. 

Thus, existing methods and apparatuses for determining 
motion for a number of frames, based on individual frame 
pairs, have several drawbacks: 

io 1. The lack of coordination in the motion estimation for 
the different frames makes it difficult to model the set of 
motion estimation fields efficiently and hence attain good 
compression of these without loss of fidelity, and good 
editability control. 

15 2. The motion estimates are unnecessarily imprecise due 
to sensitivity to random noise in the images, since the 
methods do not employ the stabilizing fact that the same 
non -random objects or phenomena are seen in several 
frames. 

2Q 3. The motion estimates are unnecessarily inaccurate due 
to the uuderdeterminate nature of many motion estimation 
problems. The imprecise, inaccurate results represent an 
over-parameterization that may fit the individual pairs of 
frames well, but have bad interpolation/extrapolation 

25 properties, and do not give good approximations of the true, 
but unknown physical motions. 

4. Attempts at coordinating the motion estimation by 
treating many frames in computer memory at the same time 
are computationally and memorywise very demanding. 

30 OBJECTS OF THE INVENTION 

It is therefore an object of the present invention to provide 
a technique for coordinating the motion estimation for a set 
of many frames, so that the set of motion estimates can be 

35 modelled effectively to give good compression and good 
editability control. 

Another object of the invention is to coordinate the 
motion estimation for a set of many frames in order to obtain 
higher precision and accuracy in the motion estimation for 

40 each of them, by discriminating between on one hand 
systematic motion patterns shared by several frames, and on 
the other hand apparent motion patterns that are unique for 
each frame and that are possibly due to random noise effects 
and estimation ambiguity. 

45 Yet another object is to attain more precise and accurate 
modelling of the true, unknown causal motion patterns, by 
probabilistically biased restriction of the motion estimation 
for each frame towards to its coordination with that of the 
other frames. 

50 It is yet another object of the invention to implement the 
technique so that it does not require very much processing 
power or computer memory, yet allows coordination of a 
high number of related frames. 

It is yet an object of the invention to provide a method that 

55 can employ both non-linear and linear modelling methods 
for the probabilistical biased restriction. 

It is also an object to provide a technique that employs 
multifirame modelling of other data than motion data in order 
to improve the estimation of motion itself. 

60 Finally, it is an object of the invention to provide a method 
that employs motion estimation, -modelling and 
-compensation to make other multiframe data than motion 
data more suitable for bilinear modelling. 

65 NOTATION AND DEFINITIONS 

In the following, the symbol '** is used for multiplication 
when needed, (except in FIG. 6, where it symbolizes 
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iteration). The symbol 'x' is used for representing dimen- FIG. 7 shows the data structure for an input hypothesis to 

sions of a matrix (e.g. Size-nRowsxnColumns). Boldface the motion estimator for some frame n. 

uppercase letters are used for representing data matrices, and FIG. 8 shows the data structure of the output from the 

boldface lowercase letters for data vectors. The terms Prin- motion estimator for some frames n, with respect to point 

cipal Component Analysis, PCA, Bilinear Modelling and 5 estimates for the hypothesis and the various hypothesis 

BLM are used synonymously in the following, to represent impact measures reflecting its expected statistical properties. 

spatiotemporal subspace modelling. FIG. 9 illustrates the rule-based handling of slack infor- 

CMUUADV ^ TTjr .xrv/tKrrinM mation in the iterative version of the EstHyp operator, with 

SUMMARY OF THE INVENTION respea lQ poim estimales of the molion and ils reliabUity 

Coordination of motion estimation over several frames 10 information reflecting estimates of its statistical properties, 
are attained by approximating the motion estimates by m which the molion field for a given frame is modified pixel 
bilinear modelling. The bilinear model represents a subspace bv P»cl according to how the field fits to the model 
approximation of the motion fields of several frames. The representing the motion estimates of other fields, 
parameters of the bilinear model — loading vectors, score BASIC IDEA 
vectors and residuals — are estimated by principal compo- 
nent analysis or some related method. The bilinear models Given the importance in e.g. video coding, of establishing 
are defined relative to a reference image. valid and reliable motion fields for each frame, as well as 

The motion estimation for a given frame is simplified and valid and/or rcliabIc motion representation for a whole 

stabilized by the use of preliminary bilinear motion param- ^ sequence, the present invention enables the accumulation 

eter values established prior to this motion estimation. These and ^ of motion information from many frames, thereby 

preliminar bilinear parameter values are used both for gen- reducing estimation ambiguity, occlusion problems and 

erating a relevant start hypothesis for the motion estimation noise sensitivity, even with limited computer resources, 

and for conducting the motion estimation for the frame The basic idea is to develop and maintain a common 

towards the corresponding motion patterns found for other 25 mathematical model description of whatever systematic 

frames previously. motion patterns are found for a set of pixels in a set of 

The bilinear motion model in the end summarizes the framcs > and this for lhe improvement of the motion 

common motion patterns for objects in a set of related estimate for each mdmdual frame. The mathematical model 

frames. lnat summ arizes the systematic motion patterns can be of 

„ , , c . , . <• ,„ different kinds. On one hand the model must have sufE- 

Several different control structures for the multi-frame 30 , . , , 

.... . j u . < , ciently many independent parameters to describe the 

bilinear motion modelling are described. . , / , r t . J; 4 , . , . . , 

& t required motions adequately. On the other hand it should be 

Special bilinear parameter estimation methods are sufficiently simple to statistical restriction of the 

described, involving spatial and temporal smooting as well underdetermined, noise-sensitive motion estimation prob- 

as reweighting and optimal scaling. lem Hence> the number of independent parameters of the 

The bilinear motion modelling is combined with bilinear model should be dynamically, depending on the need for 

modelling of motion compensated intensity changes in two modelling flexibility (avoiding underfitting) and the need for 

different ways, for enhanced motion estimation as well as for noise rejection (avoiding overfitting). 

flexible pattern recognition. The parameters in mathematical models are used for 

„ dn communicating common systematic variation patterns 

BRIEF DESCRIPTION OF THE DRAWINGS 40 between frames m order t0 enhaQce the motioo Nation 

FIG. 1 illustrates how a frame-size (with nvxnh pixels) for each frame * model parameters are in rum esti- 

motion field in one motion direction (here: DV^ for Delta mated b V suitable methods in order to accumulate and 

Vertical address, i.e. vertical motion, for each pixel from combine systematic motion information for the set of 

reference image R to image n) can be strung out as a 45 fr ames - 

one-dimensional vector with nv*nh elements; One kind of applicable mathematical modelling type is to 

FIG. 2 illustrates how two frame-size (nvxnh pixels each) approximate the common change patterns by a multidimen- 

motion fields DA^-fDV^ and DH^J for the Vertical and S10nal a*htive model, which can also be seen as a subspace 

Horizonal directions) can be strung out together as a one- model or a ' blhn ear model*. Central in the present invention 

dimensional vector with 2 nv*nh elements, for the case 50 & that this subspace model may contain more than one 

when both motion directions are modelled simultaneously; dimension. Central is also that the definition of the subspace 

FIG. 3 is an illustration of how a matrix X can be ™ del * ^ ^« °f theory driven-that is, it is 

modelled by the bilinear product of two lower-rank matrices f erram f d " a . *™ P a 'tially-from empirical data, not 

f-p* i-j7" , ■ j * f . „ from mathematical functions such as sines and cosines. 
T*F plus a residual matrix E; 

p-p .... , , - - 55 The method will be explained with regards to an appli- 

FIG. 4 illustrates the parameters from FIG. 3 pertaining to „ ftt : rtri r™ on t tA r • 

. c r cation tor 2D images: the parameterization of motion in 

one single frame; . , c ° . .... A . . . . 

° video coding for compression or editing control. It is also 

FIG. 5 shows the first preferred embodiment, in which the applicable for ID data structures (time warping in sound 

whole sequence or sequence of frames is treated jointly with analysis, line camera motion estimation in process control) 

respect to motion estimation (in block EstMovSeq), model 60 and for 3D data structures (e.g.MRI scans of human brains), 
estimation (in block EstModel) and hypothesis generation 

(in block EstHypSeq). Motion Data Representation for Multi-frame 

FIG. 6 shows the block diagramme for the part of the Modelling 

second preferred embodiment that concern the iterative The motion field in video has a Vertical component DV 

combination of motion estimation (in block EstMov) moder 65 and a horizontal component DH. They will collectively be 

updating (in block EstModel) and hypothesis estimation (in referred to as optical flow field or motion field DA ('Delta 

block EstHyp). Address'). 
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In some video coding methods, several signal records 
(frames) are related to one common 'reference image'. One 
example of this is the IDLE codec type, as described in 
patent application WO95/08240, Method and apparatus for 
Data Analysis, where the motion, intensity changes and 5 
other modelled change information for a number of 
(consecutive) frames is directly or indirectly represented 
relative to a common 'extended Reference image model' 
(symbolized by index R), for a given segment of pixels (a 
spatial 'holon') in a given sequence of related frames n=l, 10 
2, .... An IDLE type decoder using reference image model 
is described in W095/134172 Apparatus and method for 
decoding video images. 

Hence, the motion field subscripted DA^ represents how 
the individual pels in the Reference image model are to be 15 
moved in the vertical and horizontal directions in order to 
approximate the input frame n. 

In the present invention each motion direction may be 
modelled separately. 

20 

FIG. 1 shows how the vertical motion field DV^ with 
nvxnh pets can be strung out as a row vector with nv*nh 
elements. Motion fields for several frames represents several 
such vectors of the same size, which can then be modelled 
together. 25 

In the present invention the different motion field direc- 
tions can also be modelled jointly. FIG. 2 shows how both 
the vertical and horizontal motion fields can be stored in one 
row vector, now with 2*nv*nh elements. Again, such vec- 
tors for several frames will have the same sizes, and can thus 30 
be modelled together. 

Sub-space Factor Modelling of Motion Data 

In the present invention the estimated motion fields for a 
set of frames at a given point in time are modelled together 35 
in a sequence model, and this model is used for stabilizing 
the motion estimation for the individual frames, whereupon 
these newly motion estimates are used for improving the 
sequence model, etc. 

A preferred implementation of modelling method is the 40 
use of manifolds with limited number of independently 
estimated parameters, such as a neural net with few hidden 
nodes, estimated by e.g. back propagation (see e.g. Widrow, 
B. and Lehr, M. A. (1990) 30 years of Adaptive Neural 
Networks: Perceptron, Madaline and Backpropagation. Pro- 
ceedings for the IEEE, vol 78,9, pp 1415-1442.). Among the 
manifold types, the linear ones, which can be seen as spaces 
and subspaces, are preferable, due to computation speed, 
flexible choice of implementation, well understood theoreti- 
cal properties and easily interpreted results. This is described 50 
in detail in Martens, H. and Naes, T. (1989) Multivariate 
Calibration. J.Wiley & Sons Ltd, Chichester UK. 

Improved Sub-space Modelling by the Use of a 

Common Reference Position 55 

In order for the motion fields for several frames to be 
modelled efficiently together, they should preferably be 
represented in a common reference position. The use of this 
common reference position is also important in order to 60 
allow efficient modelling of intensity changes and other 
properties of the elements in the frames. These properties 
may have to be motion compensated in order to be moved 
back to this reference position for modelling. A reference 
image 1^ may be chosen or constructed from the set of 65 
related frames I„. This reference image could e.g. be the first, 
middle or last image in the sequence n=l,2, . . . ,N, or a 
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combination of information from several frames. Except at 
the very beginning of the encoding process for a video 
sequence, there are usually two or more segments (holons) 
being modelled more or less seperately and therefore each 
having its own reference image information. (More details 
on segmentation into holons aie given in the patent appli- 
cation "Method and apparatus for multi-frame based seg- 
mentation of data streams", mentioned before, and more 
details on depth estimation on multiple frames are given in 
the application "Method and apparatus for depth modelling 
and providing depth information of moving objects", men- 
tioned before. The reference image information for the 
different holons may be stored separately in several refer- 
ence images I^holon), bolon=l,2, . . . , or stored jointly in 
a collage reference image 1^. 

The representation of the spatial parameters in one com- 
mon reference position has three advantages, relative to 
motion estimation: 

1) The motion estimates become more robust against 
input data noise and against incidental motion estimation 
errors, and hence will have more reliability and validity. 

2) The motion estimates from the different related frames 
may be more easily modelled mathematically and hence 
more easy to compress and/or edit (for video coding) and/or 
lo control later (for video games and virtual reality). 

3) The motion estimation process may be faster, since the 
information from various images serve as effective statistical 
constraints. 

Algebraic Description of the Sub-space Modelling 

Some necessary algebraic tools will first be described. 

The purpose of the subspace modelling is to attain a 
somewhat flexible, but sufficiently restrictive model of the 
systematic covariations in a set of motion field estimates. 
The subspace description can be formulated as a bilinear 
factor model. 

More details on the bilinear modelling is given in H. 
Martens & Naes,T. (1989) Multivariate Calibration. J.Wiley 
& Sons Ltd Chichester UK. Here is a summary: 

The motion field vectors from a number of frames n=l,2, 
. . . nFrames may be stromes in a matrix and subjected to 
multivibrate modelling in order to find the compact yet 
flexible approximation to enhance the motion estimation. 

Each row in matrix X in FIG. 3 may be the motion field 
vector of a frame, in one or more motion directions. If the 
data for all the frames are represented at the same reference 
position, then each column in X may be seen as observed 
properties of its corresponding ' reference position pixel' 
pel=l,2, . . . ,nPels. 

These observed properties, showing for instance the stron 
out motion data defining ho the intensity value of the 
reference pixels \ R should be moved in order to reconstruct 
the realted frames I„,n-1,2, . . . ,nFrames, may be approxi- 
mated by a bilinear model (BLM): Matrix X can be written 
as a sum of a low number of more or less common change 
phenomena ('latent variables', factors, principal 
components) f-1,2, . . . , nFactors, plus residuals: 

x-xi + x 2+ . . . +>U, C( _ + E 

where 

X is the data to be modelled, — it has one row for each 
frame modelled and one column for each pixel variable 
to be modelled simultaneously (e.g. one horizontal and 
one vertical motion element for each pixel.) 
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Xj, X 2 , . . . ,X 4 , . . . yXnFactor, are the individual factor One alternative to such a regression method for estimating 

contributions spanning the major systematic covaria- scores, both for motion, for intensity changes and for other 

tions patterns in X, — same matrix size as X. data domains, is nonlinear iterative optimization e.g. stan- 

E represents the Error or unmodelled residual — with the dard SIMPLEX optimizabon (see J. A. Nelder and R. Mead, 

same matrix size as X. 5 'A Simplex method for function rninimizabon\ Computer 

Each factor contribution f«l,2, . . . ,nFactors is defined as Journal 7, p. 308-313), in which from a starting value of 

the outer product of two vectors: scores successive improvements in scores are sought so as to 

minimize some criterion, e.g. a function of the intensity 

x r t f w Pf lack-of-fit between the reference image \ R modified accord - 

io ing to the scores and the frame I n to be approximated from 

w ~; re . the reference image. Combinations of the regression 

The representation of the motion fields inside a common ch and ^ nonlinear ileralive fiuing may also be used . 

subspace model f ensures that all motions modelled in the Conversely, the bilinear model in equatioan (1) can also 

sequence belong to a common set of 'systematic motion bc writtCD fof cach individual pixcl: 

patterns . 15 

Additional constraints may be added to scores t„ to favour x p e f^P P ^& e pei ( 4 ) 

smooth temporal movements whenever possible; similarly where 

the loadings may be smoothed to favour even spatial motion x ^ nFramesxl 

patterns. These additional constraints may be imposed after j ^ n FramesxnFactors 

the bilinear rank reduction modelling, or included in the on * n . « 

, 20 Voei ^ nractorsxl 

actual bihnear modellmg. p . ^ 

Various aspects of the current invention for tailoring the .{*' ... , ., , , , 

rank-reducing model estimation method to the present needs Hence m situations when data x^, are available for some 

will be described later: An aspect for integrating the rank n ™ P lx ^ s > ra a c " tam of rames 0=1,2, . . . ^Frames 

reduction of pea and spatotemporal smoothing, and an „ ^ the scores T are available for these frames for a set 

aspect for delayed, adaptive point estimation to enhance 25 ^factors f=l,2, ... ,nFactors based on omer pixels, but their 

inter-frame coordination. loadln S valucs ^> arc ™known for these new pixels, then 

The number of factors MA ... to be retained in these these loadlDg caD be f 1 ™^ by projecting the data 

models may be determined as described by Joliffe (1986), x p" on XOKa T ' b * olAm ^ least ^ uares re 8tession: 

Jackson (1991) or Martens & Maes 1989, mentioned above, 30 / =(T r, T)- 1 *T r "jc (5) 
e.g. by cross validation. 

More details will be given below in conjunction with 

Linear Modelling special inventions in this regard. 

„. . ,. . , . When the motion fields DA„,„ n=l,2, ... from a set or 

Since loading matrix subspace P represents the systematic subs6( of frames m defined x then ^ b ~ 

motion patterns that have been found to be more or less valid 35 sub pr nin , he most si jfi^, row of x 

for all the frames analyzed, it may be expected that the tef)resenis tbe molion information more or less common to 

motion vector found in an individual frame n in the same frames in the ^ ^ vectors t for 

sequence should also be well approximated a representation frames n=1>2 , . . . , estimated for each frame separately or for 

inside this subspace P. Its position inside the subspace P many frames join , (see below)> ^ (0 m , his CQm . 

corresponds to its score vector t H -[t„ 1>tn2 t„^ OCK> „]. 40 m0Q molion Mom;itioa pr ^ tQ each ^dividual frame 

So, as FIG. 4 shows, the bilinear model can for an n-1 2 

individual frame n be written: T^e parameters in a bilinear model, i.e. loading and score 

x =/ *v T +e (d parameters T and P, as well as the residuals, arise from a 

" n " statistical estimation process, e.g. taking the first few factors 

where 45 from a singular value decomposition of X. These factors 

data x is lxnPels ideally represent the main relevant information in X. But 

scores" t is lxnFactors they als ° cont ™ more or less est i™ation noise. A bilinear 

n T model gives better separation the lower the number of 

loadings: P is nfactorsxnpels factors m the model is compared to the number of obser _ 

residual: e„ is lxnPels 50 vations used for determining the bilinear model's param- 

An offset may be included in the bilinear models (confr. eters. 

Martens, H. and Naes, T. (1989) Multiva rate Calibration. The uncertainty covariance of the model parameters T and 

J.Wiley & Sons Ltd, Chichester UK),— this is for simplicity p ma y be estimated by approximation theory. For instance, 

ignored in the present explanation. assuming residual elements in E are normal distributed 

Based on the motion data (e.g. stringed-out motion field 55 N(0,s 2 ), these uncertainties can be estimated by: 

estimate) for frame n, x„, and on subspace loading P, the Covariance of scores- Cov(t WPW^s 2 

scores t„ and residual vector e can be estimated. A number ^variance of loadings: CovV^^T^" 1 ^ 2 

of different estimation method can be employed, as long as . _ , f , a * u j * , 

t u~*, ™^ *u A i . ■ .u ■ j t Covariance of reconstructed data x„Hyp and of residuals 

they generally make the elements in the residual vector " 1V 

small. Weighted linear regression is one good method (see 60 E . cov(/„p ,HA,+A D** 2 (6) 

Weisberg S. (1985) Applied linear regression. 2nd ed., " 

J.Wiley & Sons, New York, and Martens, H. and Naes, T. where 

(1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chich- leverage of frame n=h„=diag(T(T r T)- :l T 7 ) 

ester UK): For diagonal weight matrix W the scores are then leverage of pixel pel=h pe/ =diag(P(P :r P)- :1 P 7 ) 

estimated by: 65 Alternative information about the uncertainty of the 

reconstructed motion fields (i.e. x n Hyp) can be obtained 

t^x^w-P'CP^WP)- 1 ( 3 ) from: 
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a) Residual intensity after applying the motion field: 
Large positive or negative intensity residual for a pel indi- 
cates invalid motion, e.g. due to occlusion problems or 
systematic intensity changes. 

b) Slack: An estimate of the ambiguity or unreliability of 5 
a motion field may obtained by detecting how much the 
motion value for a pixel can be modified in different 
directions from its present value in x n Hyp before the result- 
ing intensity lack-of-fit increases significantly compared to 

an certain intensity noise level. 10 

In estimation of scores for a new object, the scores' 
covariance for the different factors may be estimated from 
that frame's noise variance s^: Cov(Q^(V T P)~ 1 *s n 2 , In 
estimating the loadings of a new pixel, the loadings' cova- 
riance for the different factors may be estimated from the 15 
pixel's noise variance s^, 2 : Cov(p^ c ^ r T T)*s pW 2 . The 
variances involved may be based on a priori knowledge or 
estimated from the data themselves after suitable correction 
against overfitting, as e.g. described by Martens, H. and 
Naes, T. (1989) Multivariate Calibration. J.Wiley & Sons 20 
Ltd, Chichester UK. 

In some applications, certain known variation patterns are 
expected or suspected a priori to occur Parameters describ- 
ing such a priori variation patterns may be included in the 
modelling, thereby eliminating the need for estimating the 25 
corresponding parameters from the data themselves. If 
known spatial variation patterns are available, they may be 
included in the loading matrix P, as factors for which only 
scores need to be estimated. If known temporal variation 
patterns are available, they may be included in score matrix 30 
T, as factors for which only loadings need to be estimated. 
If both their spatial and temporal parameters are known, they 
can be included in the bilinear loading and score model 
without any parameter estimation. 

Choice of Motion Estimator 35 

The motion estimator should preferably be of some sort of 
estimation type that is able to make use of the x n Hyp 
information and its associated hypothesis impact measures. 
In the motion estimation of DA /2fI =x„, the advantage of good 40 
fit between the Reference image \ R and the present image \ n 
must be balanced against the advantage of good agreement 
with the other frame's motion estimates, as conveyed by the 
bilinear x„Hyp, — as well as against fit to other hypotheses, 
e.g. about temporal and spatial smoothness. An example of 45 
such a motion estimator is given in WO095/26539 Method 
and apparatus for estimating motion, which is hereby 
included by reference. 

The motion estimator is preferably based on mapping the 
lack-of-fit for each pixel position in \ R to I„ w.r.t various 50 
alternative motions around some expected motion field used 
as offset from the pixel position in \ R . For each pixel position 
in 1^ various input hypotheses to the motion estimator are 
used for making the motion estimation less underdeter- 
mined: The empirical lack-of-fit for the different alternative 55 
motions are shrunk towards zero in those areas where the 
motions are expected according to the hypotheses. Subse- 
quent spatial smoothing is applied to the shrunk lack-of-fit 
data in order to favour continous motion fields, and the 
minimum of this smoothed shrunk lack-of-fit is taken for 60 
each pixel in \ R as its preliminary motion estimate. This 
motion estimate is further filtered and modified according to 
depth/occlusion analysis, resulting in the motion estimates 
DAj^, which for the bilinear matrix algebra is also termed 

x «- 65 

Alternatively, the motion estimator may be based on 
phase-correlation to detect the main motion types, followed 
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by an interpretation procedure that ascribe the different 
motions detected to the different parts of the picture; the 
hypotheses may be used both to modify the phase 
-correlation map (e.g. adding extra correlation where x M Hyp 
has validity) and the subsequent interpretation phase 
(putting a premium on motions where the phase correlation 
motions and the hypotheses agree). 

Other motion estimators may also be used. 

Application of Bilinear Modelling in Conjunction 
with Motion Estimation 

XHat represents the subspace approximation of X, in the 
sense that both the scores T and the loadings P have nFactors 
column vectors that span an nFactors dimensional subspace. 
The T subspace describes the main variations and covara- 
tions between the frames involved, and the P subspace 
describes the corresponding main variations and covaria- 
tions between the variables (pixels) involved. 

Estimation Methods for Bilinear and Linear 
Modelling 

Bilinear Modelling (BLM) 

There are a number methods for extracting the most 
salient subspace from a matrix X, as described by Martens 
& Naes 1989, mentioned above as well as in Jolliffe, I. T. 
(1986) Principal Component Analysis. Springer Series in 
Statistics, Sp ringer- Verlag New York, and in Jackson, J. E. 
(1991) A User's guide to principal components. J. Wiley & 
Sons, Inc. New York. Common to them is that they extract 
the major covariation patterns in X into XHat with as few 
factors as possible, leaving the more or less unsystematic or 
unique variances in residual E. Principal component analysis 
(pea) or statistically trunkated singular value decomposition 
(eliminating small singular value structures) can be used in 
the context of the present invention. 

PLS regression may be used if external information is 
available, to which the motion estimation should be coor- 
dinated. One example of this to use sound information (e.g. 
energy at different frequency channels or from different 
filters) for the frames as Y variables, and use the motion data 
as X variables as described here. Another example is to use 
time shifted motion data as Y variables. 

Vertical and horizontal motion may be modelled in a 
coordinated way, if so desired, by the use of two-block 
bilinear modelling, e.g. by PLS2 regression, (Martens, H. 
and Naes, T (1989) Multivariate Calibration. J. Wiley & 
Sons Ltd, Chichester UK.) 

If motion is estimated for more than one objects (ho Ions), 
then the bilinear modelling of the motion data may be 
coordinated by the use of some N-way linear method 
(hierarchical multi-block bilinear method or bilinear con- 
sensus method), such as Consensus PCA (Geladi, P., 
Martens, H., Martens, M., Kalvenes, S. and Esbensen, K. 
(1988) Multivariate comparison of laboratory measure- 
ments. Proceedings, Symposium in Applied Statistics, 
Copenhagen Jan. 25-27, 1988, Uni-C, Copenhagen 
Danmark, pp 15-30.) ty is a column vector with one element 
for each frame. Each element ^describes how this factor f 
manifests itself in frame n. Vector ty is here called the score 
vector, and its values may be positive, null or negative. 

Vector p/"(the transpose of vector p 7 ) is a row vector with 
one element for each variable analyzed (e.g. for each pixel). 
Each element p^ describes how this factor f manifests itself 
for variable k. Vector py is here called the loading vector of 
factor f. Its values may be negative, null or positive. A 
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restriction on the vector length of if or on py is usually or, for individual pixel in frame n: 
imposed to avoid affine algebraic ambiguities, e.g. that the 

sum of the squared elements in t„ should be 1. x^Wv^n'?^ 

The full factor model can then be written ^ loadin p ^ beeQ frQm ffi0tion dala fof 

or on matnx form, illustrated in FIG. 3: 5 ^ to ^ ^ 

scores t„ for frame n are at first estimated by temporal 
^ ^ ^y" ^ , p T + E forecast from other frames; if the bilinear modelling is used 

£i iterative ly in the motion estimation, new scores may be 

10 obtained by modelling preliminary estimate x n io terms of 
x = T* + E loadings P as described above. With the hypothesis are also 

the corresponding covariances Cov(t n p / , e/ ) or other reliabil- 
wnerc ity statistics estimated for each pixel, e.g. as described 

T-rt ft f-l,2. . . . .nFactorsl is the matrix of scores for the d ^ x ^' , ... , , , • . A a 

. L f. ' ' ' . . J _ . * « This bilmear hypothesis may be used in two different 

bilinear factors, — it has one row for each frame mod- 15 . 

elled and one column for each bilinear factor modelled, ^Y 5 * 

f=l 2 nFactors. a ) ^° save C P U anc ^ memor y> as an offset or start point for 

' „ ir . , - 1 r the time- and memory demanding search process of 

™ tt lpAt M li2, ... ,nractors| is the matrix of loadings for „ 

i -i m- r • , , r ... motion estimation 
the bilinear facors, — it has one column for each pixel 

variable to be modelled simultanelusly and one row for 20 b ) To improve prec^ion and inter-frame coordination: An 

each bilinear factor model f=l,2, . . . , nFactors. The a P non statislical expectation, used e.g. for modifying 

superscript 7, means 'transposed'. the intensit y differences to favour this result within the 

The factor contribution matrix product T^P 7 * can be n0ise level of the data - 

expressed as an approximation of data matrix X and is hence ^ bilinear sub sp ace hypothesis x„Hyp, may in the 

termed matrix XHat: 25 P resent invention be used for stabilization and coordination 

of the motion estimation for the corresponding frames, 

x=XHat*E provided that the motion estimator used in the system is of 

a type that can utilize such offsets and additional statistical 

The bilinear modelling tools described above are in the distribution expectation hypotheses. The main effect of this 

present invention used for three different purposes: 30 can be summarized as: 

1) Improvement of motion estimation for the individual Without the bilinear hypotheses x„Hyp to connect the 
frame motion of the different frames, the full motion field estima- 

2) Motion modelling for a sequence of frames tion for P ixels relative to an individual frame n is normally 

3) Enhancement of motion estimation by multi-domain hi ^ ly ^ may * fS™^ 
modelline motions with good fit, i.e. that appear equally probable, and 

Each of these will now be briefly outlined below. mav ,hus ' ^ c ^ n f - f ve 1 uite different motion fie ' dsfor a 

given frame. With thermal intensity noise in the input 

1) Improvement of Motion Estimation for the frames, it is quite random which of these alternative motions 

Individual Frame ^ selected. This in turn makes modelling of the motion fields 

40 difficult, which in turn result in poor compression etc. In 
For motion estimation for an individual frame in a addition, without a good starting point for the search process 
sequence of related frames, the bilmear models based on the motion estimation can be very cpu and memory demand- 
other frames in the sequence are employed in order to m g 

improve the estimated motion field DAn for the individual with the use of the bilinear hypotheses ^Hyp in the 

frame. This may entail a bilinear definition of a start point 45 motion estimation process, for each pixel in each frame a 

(offset) for the search process, as well as a statistical motion pattern is chosen (from the set of alternative good-fit 

modification of the motion estimation through the use of motions) that also corresponds to the systematic, reliable 

motion hypotheses. motions found in other frames. Also, with a good starting 

The use of the bilinear model hypotheses is controlled so point for the search process the motion estimation becomes 

that reliable model information is used strongly, while less 50 less cpu and memory demanding. 

reliable model information is used only weakly or not at all. At each pixel there may be several different bilinear 

The offset and the hypotheses may be defined prior to the hypotheses, each corresponding to one given set of assump- 

motion estimation, or updated iteratively during the motion tions about the data. Other types of assumptions (e.g. 

estimation. This will be described in more detail below. smoothness for scores in time, smoothness for loadings or 

Lack of fit residual between reliable motion field data 55 motion field ^ s P ace ) mav y ield vet other ' additional 

DA„ and the bilinear model is used for detecting pixels that hypotheses. 

do not fit to the bilinear model— either because they repre- . Different hypotheses may be used simultaneously in a 

sent a new motion pattern not yet modelled, or because of £i ven motion estimation. 

errors in the data or in the bilinear model available. Hypothesis Reflects the Assumed Probability Distribution 

60 of the Expected Result 

Generation of Hypothesis Based on the Bilinear Each hypothesis x n Hyp for a frame represents a point 

Subspace Model estimate within the statistical probability distribution of 

The way information from other frames is conveyed to an w 5 ere x » * CX F°** t0 Ue ; > ud f?& fr ° m , the ""J 1 * 
individual frame n during motion estimation is in the shape "l fo ™ at .'°_ n "> the subspace formed by other frames. Asso- 
of a bilinear prediction hypothesis x„Hyp: 65 ^ late f ™ th P° mt estnnate 15 P^ferably also some more 

detail about how precise and how important the hypothesis 
jc„Hyp=/„-p r (7) is. This is oudined in FIG. 7. 
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For each pixel the actual values of each such hypothesis 
x„Hyp 710, 720 may therefore have reliability estimates 
associated with it, and from these a set of Hypothesis Impact 
Measures can be computed, later to be input to the motion 
estimation. The following is one practical set of descriptors 5 
for the hypothesis validity: 

1) The Hypothesis Strength 750. This defines how 
strongly the hypothesis shall be counted, relative to the 
lack-of-fit of the input intensity data. 

Pixels with unreliable or unsatisfactory hypothesis are 10 
given low weight and hence the hypothesis will have little or 
no impact on the ensuing motion estimation for this pixel. 

2) The Hypothesis Shift Range 730. This defines how the 
hypothesis for each individual pixel shall give credit also to 
alternative motions that are different, although similar to 15 
motion x„Hyp. 

3) The Hypothesis Propagation Range 740. This defines 
how this hypothesis should affect the motion estimation of 
nearby pixels. 

20 

2) Motion Modelling for a Sequence of Frames 

The second, and quite related, usage of bilinear modelling 
of motion fields concerns how to improve the modelling of 
motion patterns: By extracting the major eigenstructures or 
related dominant factor structures from motion fields from 25 
several related frames, given the same reference image 
coordinate system, the signal/noise ratio of the results can be 
greatly enhanced. 

For a set of related frames ' estimated motion fields, 
DA i? „,n=l,2, , . . ,nFrames, extract the motion patterns 30 
common to these frames by bilinear modelling of these 
motion estimates, in terms of the subspace spanned by a 
bilinear loadings P r , the corresponding scores T and residu- 
als E. 

This common modelling of the estimated motion fields 35 
may be done once and for all, or iteratively. In the case of 
iterative modelling, the estimated motion fields may be 
modified by certain rules to give optimal fit to a low- 
dimensional common bilinear model. 

40 

Details of these alternatives are described in the preferred 
embodiments. 

3) Enhancement of Motion Estimation by Multi- 
domain Modelling 

45 

During motion estimation — for an individual frame 1), or 
for a sequence of related frames 2), estimated changes in 
other domains, such as intensity, depth, transparancy or 
classification probability may also be modelled by bilinear 
approximation, in analogy to the bilinear approximation of 50 
the motion data. For instance, when there are gradual color 
changes in the sequence images to be submitted to motion 
estimation, e.g. due to changing in the lighting, these inten- 
sity changes may give errors in the motion estimation. By 
allowing some systematic intensity changes the sequence, 55 
the motion estimation can be made more accurate. But if too 
many intensity changes are allowed in the sequence, the 
motion estimation can be destroyed. 

The multifactor linear or bilinear modelling of allowed 
intensity change patterns provides a flexible, yet simple 60 
enough summary of the systematic intensity changes that do 
not appear to be due to motion. This is particularly so, if the 
intensity change loadings are known or have been estimated 
a priori, so that the probability of erroneously modelling 
motion effects in the intensity domain is minimized. 55 

Similarly, multifactor linear or bilinear modelling of 
depth, transparancy or classification probability can enhance 
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the motion estimation and modelling, by correcting for 
systematic changes that would otherwise impede the motion 
estimation. But if allowed too much flexibility, adaptive 
correction in these alternative domains can distroy the 
motion estimation. Therefore such multidomain modelling 
must be done with restraint: only clearly valid change 
patterns must be included in the multidomain models. There 
constraints can be relaxed during iterative processes as the 
bilinear models become less and less uncertain. 

The use of bilinear multidomain modelling in conjunction 
with motion estimation is described in more detail in the 
Fifth and Sixth Preferred Embodiments. 

Preferred Embodiments 

The stabilization of the motion estimation and simplifi- 
cation of the motion field modelling can now be done in the 
various ways for a given holon (or for the whole frame) in 
a set of related frames. 

A first embodiment of the present invention for multi- 
frame coordination of motion estimation is described in FIG. 
5. It consists of iterating between 1) Estimating the motion 
fields DA^ for all the frames n=l,2,3, . . . (relative to a 
reference frame R), and 2) Estimating the subspace and the 
hypothesis for all the frames. 

A second embodiment with more detail is illustrated in 
FIGS. 6 and 7. It consists of using, for any frame at any stage 
in the iterative estimation process, whatever subspace infor- 
mation available at this stage, for the stabilization of the 
motion estimation for individual frames, and then updating/ 
downdating the subspace estimate with the obtained indi- 
vidual motion estimates. 

A third embodiment employs a bilinear modelling tool 
that includes spatiotemporal smoothing as additional opti- 
mization criterion integrated into the estimation of the 
bilinear parameters. It operates on a given set of input data 
and a given set of statistical weights for rows and columns 
in these data. 

A fourth embodiment employs a bilinear modelling tool 
that allows several types of additional information and 
assumptions to be integrated into the estimation of the 
bilinear parameters. It includes iterative modification of the 
input data as well as of the statistical weights for these data. 

A fifth embodiment employs bilinear and multifactor 
linear modelling both in the motion domain and the intensity 
domain, to allow improved motion estimation on system- 
atically intensity-corrected images. 

A sixth embodiment represents a pattern recognition 
extension of the fifth embodiment, based on combining a 
priori empirically estimated bilinear models in the intensity 
domain (and optionally in the motion domain) with iterative 
pattern recognition search processes. 

FIRST PREFERRED EMBODIMENT 

Bilinear Modelling After Motion Estimation for 
Whole Sequence 

FIG. 5 shows a first embodiment of an apparatus 500 
according to the invention operates in its simplest form for 
a sequence. Based on input intensities I rt ,n=l,2, . . . 510 for 
the individual frames (plus possible reliability information), 
and on a reference image model 1^ 530, it delivers or outputs 
the desired motions estimates DA^n«l,2, ... at 570 and 
final hypotheses at 580. The apparatus 500 operates by 
having motion estimation done in a block 520 EstMovSeq 
for the whole sequence and hypotheses make in a block 550 
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EstHypSeq, with intermediate results stored in blocks 540 
and 560, respectively. EstMovSeq 520 estimates the motion 
fields, based on intensities I^n-l^, ... for the frames 
involved, and on the bilinear model information stored as 
part of the Reference Image Model, and using whatever 
hypothesis information 560 is available. EstModel 590 
estimates the bilinear model subspaces of the motion (and 
possibly that of systematic intensity changes as well), and 
updates the bilinear model information in Reference Image 
Model 530 with this. EstHypSeq 550 forecasts the hypoth- 
eses for each frame based on the new bilinear model 
information in the Reference Image Model 530 and on the 
output 540 from the EstMovSeq 520. 

The algorithm can be written as follows: 

Divide the sequence into shorter, more homogenous 
sequences, if necessary. 

One known method is to calculate a histogram of intensity 
or color distribution for each frame in the sequence, calcu- 
late a measure of similarity between histograms for pairs of 
frames, and assuming that when the similarity of histograms 
between pairs of frames is small then there is probably a 
scene shift. This method will catch some scene shifts, but not 
all. 

Define one or more holon's reference image 1^, e.g. a 
certain frame in the subsequence, a certain segment of a 
certain frame in the sequence, or an accumulated composite 
segment from several images. 

SUMMARY 

For each homogenous sequence and holon: 
While sequence estimate not converged 

Form hypotheses of the motion field for all frames in 

EstHypseq 550 
Estimate motion field for all frames in EstMovseq 520 
Estimate the bilinear motion model subspace in Est- 
Model 590 
Check convergence for the sequence 
End of while sequence estimate not converged 
The first preferred embodiment in more detail will now be 
described in more detail: 

While sequence estimate not converged 

Form hypotheses of the motion field for all frames in 

EstHypseq 550 
Renew hypothesis of the motion field estimate x„Hyp for 
each frame in EstHypSeq 550 from equation (7). Additional 
hypotheses may also be formulated (e.g. by temporal 
interpolation/extrapolation), but their discussion will for 
simplicity be postponed to the second preferred embodi- 
ment. 

Assess also the uncertainty of this estimate, and determine 
the hypothesis distributional reliability parameters, includ- 
ing estimated depth/folding and occlusions from other 
ho Ions. Frames that generally fit well to the general subspace 
model P without themselves having influenced the subspace 
definition very much (low frame leverage in T) are given 
high general hypothesis strength, — other frames are given 
lower strengths. Pixels with good fit to the subspace model 
without being very influential in the score estimation (low 
variable leverage in P) are given relatively high strength 
compared to the other pixels. Pixels for which the estimated 
uncertainty variance of the hypothesis is low are given 
relatively high strengths. Pixels for which the hypothesis is 
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they are near or inside estimated intra- or inter- holon occlu- 
sions are given low weights. 

The hypothesis ranges are defined such that early in 
iterative processes, before subspace P is well defined, the 
shift range and propagation range are generally set relative 
large. As the estimation process proceeds and P becomes 
more well defined, the ranges are generally reduced. The 
hypothesis shift range for individual pixels is set such that 
for pixels with satisfactory, but imprecise hypothesis the 
hypothesis is regarded as more or less applicable over a 
wider range of motions than for pixels with precise hypoth- 
esis. The hypothesis propagation range is set such that pixels 
with very clear hypothesis are allowed to affect the hypoth- 
esis of other pixels, e.g. in the neighborhood, if they have 
more unclear hypotheses. 

Estimate motion field for all frames in EstMovseq 520 

Estimate the motion fields x^DA^ from the reference 
frame 1^ to each of the frames I n) n=l,2, . . . ,nFrames in 
EstMovSeq 520, based on the available information: 1^ 
(or some transformation of 1^, preferably with known 
inverse) and their uncertainties. In addition, the motion 
estimation is stabilized by the use of various hypotheses 
x M Hyp, e.g. based on previously estimated bilinear loadings, 
and the hypotheses' distributional parameters such as 
hypothesis strength, hypothesis range and hypothesis propa- 
gation range, plus estimate of intra- holon depth/folding and 
occlusions from other holons. 

Estimate the bilinear motion model subspace in EstModel 
590 

Estimate the scores and loadings of the motion subspace 
in EstModel 590 by bilinear modelling of motion data 
X=(x„,n=l,2, . . . ,nFrames), e.g. singular value decompo- 
sition or weighted nonlinear iterative least squares (Nip a Is) 
modelling according to eq. 1, or by a bilinear estimator that 
includes spatio temp oral smoothing (see third preferred 
embodiment) and/or iterative Optimal Scaling (see fourth 
preferred embodiment). 

The estimation yields loadings P and scores T and residu- 
als E. Determine the statistically optimal number of factors 
in P and T, e.g. by cross validation (preferably a low 
number). Optionally, make similar bilinear modelling of 
residual intensity variations moved to the reference position. 

When motion data for a given frame and onwards do not 
allow good reconstruction, and/or when the motion data X 
cannot be well reconstructed from the corresponding scores 
and loadings, then it may be assumed that a scene shift has 
occured, and the current subsequence should be divided into 
two different subsequences where modelling should be done 
for each separately. 

Check convergence for the sequence 
End of while sequence estimate not converged 
In summary, in the first preferred embodiment each pass 
though the sequence consists of first estimating hypotheses 
all the frames in EstHypSeq 550, then estimating motion for 
all the frames in EstMovSeq 520, and estimatingftip dating 
the model for the holon in EstModel (590) using all the new 
information simultaneously. 

SECOND PREFERRED EMBODIMENT 

Updating the Bilinear Model After Motion 
Estimation for Each Frame 

In the second preferred embodiment, the bilinear model is 



found to give good fit to the corresponding when x„Hyp is 65 updated after the motion estimation for each frame instead 
applied to the intensity data, are given relatively high of after all the frames have been through motion estimation, 
strengths. Pixels that are deemed to be uncertain because This is described in FIG. 6. 
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Again, it is applied for a subsequence of related frames, 
and for a holon which may represent a full frame or a 
segment. 

In order to optimize the coordination the motion estima- 
tion between the frames in sequence, the system passes one 5 
or more times through the sequence. For each pass it iterates 
through the frames involved. In order to optimize the motion 
estimation for each frame the system iterative ly coordinates 
the estimation of motion (EstMov) with the reestimation of 
hypothesis (EstHyp). While not converged for a frame, the *° 
Reference Image Model is kept more or less constant. Once 
this has converged for a frame, the obtained motion field for 
this frame is used in EstModel to update the bilinear 
Reference Image Model in EstModel. 

The algorithm summarized in FIG. 6 can be described as 15 
follows: 
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Estimate motion in sequence (600): 

While sequence estimate not converged 
For frame n = ImFrames (630) 

From input image data (610) and available model infor- 
mation^ 30), estimate motion (670) and update the 
model (630): 
While frame iterations not converged 

Form hypotheses of the motion field x„Hyp in EstHyp 
(650): 

Estimate motion field for the frame in EstMov (620) 

Check convergence for the iterations for this frame 
End while frame iterations not converged. 

Estimate the bilinear motion model subspace (630) in 
EstModel (690) 
End for frame n = l:nFrames (630) 

Check convergence for the sequence 
End of while sequence estimate not converged 



The second preferred embodiment, in more detail, con- 35 
sists of the following steps: 



Estimate motion in sequence (600): 4q 
While sequence estimate not converged 
For frame n = l.nFrames (630) 

From input image data (610) and available mode 

information (630), estimate motion (670) and update 

the model (630): 

While frame iterations not converged ^ 
Form hypotheses of the motion field x„Hyp in EstHyp 
(650): 



Several hypotheses can be formed, depending on the 
available information: 50 

Temporal forecast: If scores 1^,113=1,2, ... are avaliable 
from previous and/or later frames, from other spatial reso- 
lutions or from previous sequence iterations, and smooth 
temporal motions are expected, then attempt to make a 
temporal forecast from these, using linear prediction, e.g.: 55 

r„Hyp V/>-i+V„-2+ ■ - • 
^Hyp^Hyp^P^ 

so that the predicted value expresses the stationarity inside 
the time series extracted by linear regression of the data on 
the model. 

The statistical uncertainty covariance of this hypothesis 
may also be estimated, based on the estimated uncertainties 
of the scores from the time series modelling, and propagated 
through the loadings: 65 



Cov(*„Hyp>P*Cov(/ fl Hy Pl )*P r 



where CovO^HypJ is some standard statistical estimate of 
the covariance of the temporal forecast. 

Optionally, estimate local depth field of the holon for this 
frame, e.g. by trial and error. Estimate also the intensity 
lack-of-fit obtained when applying this forecasted motion to 
the reference image model. 

Bilinear fit: If a motion field x„ has already been estimated 
in a previous iteration (with its estimation uncertainty 
measures, estimated depth field and alternative motions, and 
the associated intensity lack-of-fit estimates), then estimate 
scores by ordinary least squares regression: 

'^Hyp^/P'O^P)- 1 

or by some weighted or rewcighted version of this. 
Estimate also the corresponding uncertainty covariance 

C^r^yp^-CP^P)- 1 -^ 2 

where s x z is the estimated uncertainty variance of x„. Addi- 
tional covariance may be added due to estimated uncertainty 
of the loadings P. 

As described here, the change information x„ is repre- 
sented e.g. as motion field DAj^ in the reference position, so 
that it is compatible with the bilinear loadings P alsoe 
represented in the reference position. Alternatively, the 
change information may be represented in the position of the 
pixels in frame n, e.g. the reverse motion field DA^j, and 
projected on a compatible version of the loadings P, i.e. P 
temporarily moved the that same position using motion field 

Optimal correspondence between the motion field infor- 
mation for frame n and the model information from the other 
frames in the sequence can be obtained by an iterative 
reweighting scheme. Outlier pixels can be detected and 
downweighted locally by using an iterative reweighting 
scheme, to reduce the effect of occlusions etc on the esti- 
mation of the scores. 

An integration of linear modelling and smoothness 
assumptions is described in the third preferred embodiment. 
A rule-based algorithm for a reweighting scheme that also 
involves modification of the input data to the linear model- 
ling is described in the fourth preferred embodiment. 

Available information about the expected dynamics of the 
motions in the sequence analyzed can be applied to modify 
the obtained score estimate i n with respect to temporal 
smoothness. 

Once the scores t„Hyp 2 =t„ has been estimated, then 
generate hypothesis x„Hyp 2 , e.g. 

x„HYP 2 =f„Hyp 2 *P r 

Generate also a simplified estimate of the statistical 
probability distribution of this hypothesis point estimate, as 
outlined in FIG. 7, resulting in Hypothesis Impact Measures: 
Pixels with particularly high pixel leverage, diagfP^^P)" 1 
P 7 ) and/or frame leverage, diagCT^T)- 1 ! 7 ), and/or abnor- 
mal bilinear residuals E or decoding intensity errors DI, are 
given higher uncertainty that the other pixels. These uncer- 
tainties form the basis for computing the various Hypothesis 
Impact Measures which define how the point estimate 
x„HYP 2 is applied in subsequent motion estimation. In the 
preferred embodiment, the higher uncertainty of a pixel in a 
hypothesis, the lower is its strength 750 and the smaller is its 
Shift Range 730 and Propagation Range 740. 

Hypotheses based on other principles may also be esti- 
mated at this stage, and used in analogy to x^yp^ in the 
subsequent motion estimation. 

Yet other hypothesis principles may be based on the 
assessing the spatial derivative of the motion field x„ and its 
uncertainties: 



12/10/2003, EAST Version: 1.4.1 



6,157,677 

19 20 

Precision dominance filtering x„Hyp3: Modify x n so that A reliability estimate of the estimation motion field(s) is 

at each pixel in x„Hyp 3 for each property (e.g. vertical an d the 'slack* or standard deviation from motion field x„ that 

horizontal) is a weighted average of other pixels that are seems to arise if random noise of a certain standard deviation 

deemed to have relevant information; this serves to let ^ a dded to the intensities I* or I„ from which the motion 

precise motion estimates from easily identifiable picture 5 field was estimated. There could be several such slack 

elements from some parts of the image replace or at least uncertainties of each pel in x„,— left and right for horizontal 

influence the more uncertain motion estimates for less easily uncertainty, upwards and downwards for the vertical 

identifyable picture elements at other parts of the image. The uncertamtVi ^ f orW ard and backward for the depth uncer- 

relevance of one pel with respect to influencing the motion lain UDCertainties may also be given m other dimension 

climate of another pel is a function of the distance between 1Q directi md for WQ or more ^ m&d iQtensil noise 

these pels. This distance is computed in two ways, — in the levels ' 

image space where vertical and horizontal distance is A validitv eslimate of ^ motion fie , d(s) ^ 

counted, and/or in the factor loading space P where sum- „. Ct ^ c u * c ■ • i 

; . \ 7. . rrti * i * ti , the worse intensity fit motion field estimate for a given pixel 

lanty in loadings is counted. This results in x„Hyp., at each j r „. a j- *l ~ * • • •« *u . 

, / . - . _. *■ i , . x„„,, delivers upon decoding, the more uncertain is it that 

pel being a weighted average of its own x„ values and the x M , .i?^ c ij 4 • t a .l i-j-. 

i * .i. i -n. ■ . j -T ■ . c ii 15 this motion field estimate is correct. Another validity esti- 

valuesof other pels ^e associated uncertain y of x„Hyp 3 ,s ma(6 {s tha( ^ {n , ^ ^ , 0 ^ invisibl / m , 

accordingly computed as weighted average of the uncertain- , ,, . . • *• ** 

. * J r J 3 ^, to probably have uncertain motion estimates, 

ties of the corresponding pels in x„. 

Success dominance filtering x„Hyp 4 : At pels for which no 

good motions have been found (as judged by the fit between ^ 

the reconstructed and the actual input image l n ), the motion Check convergence for the iterations for this frame 

estimate may be replaced by motion estimates from other End wnilc f" ramc iterations not converged. 

relevant pels with more successful motion estimates, in ^ / ^ ear motion model subs P ace < 630 > " 

. r . .... . . EstModel (690) 

analogy to the precision dominance filtering above; the 

uncertainties are defined accordingly. Uncertain ties are 

propagated accordingly. This bilinear modelling of the motion data (and 

Physically improbable motion filtering x n Hyp 5 : Image optionally, intensity data) can be done in a variety of ways, 

parts where x„ appears to be physically improbable are The analysis may be performed anew on the motion esti- 

corrected. One such case is that if the spatial derivative of mates of a set of frames including the present frame n, 

the motion field for some pels is higher than 1, then this will 3Q X=[x m ,m= . . . ,n, . . . J, e.g. by weighted QR or singular value 

result in folding if the same motion pattern is amplified. If decomposition of X . 
alternative motions at these pels can be found with about the 

same motion fit, these alternative motions are inserted in Updating Bilinear Models 
xJHy Ps . Uncertainties are based on the probability of the Alternatively, it may be performed by incremental 
different physical assumptions and their fit to the data x„. 3J updating> e g weighted adaptive QR-algorithm based sin- 
Predictions from Other Spatial Coordinate Systems gular value decomposition, or by weighted Nipals principal 
x Hyp 6 component analysis (conf. Martens, H. and Naes, T. (1989) 
Motion field estimates may be obtained at different coor- Muriate CaHbration. J. Wiley & Sons Ltd Chichester 
dinate representation, e.g. a different spatial resolution, and The effect ^ of a new frame n may be added to an old 
transformed into the coordinate representation presently 40 model m tms wa ^ : 
employed. Such alternative estimates may also be used as 
hypotheses, so that the probability of finding motion esti- 
mates that satisfy the different coordinate representations is 
increased. Uncertainties in the other coordinate representa- 
tions are transformed with the motion field data. 45 

, „ „ , „ , If frame n already has contributed to the previous model, 

Estimate motion field x„ for the frame in EstMov (620) n it _ . lt . , • \ ■ j 

v ' P^, then only the difference in x„ (x M -x n , previous) is used 

Estimate the motion field x fl =DA /&l from the reference in this updating, 

frame to frame n, based on the available information: X can be modelled as follows: 

Intensifies I„ and 1^ (or some transformation of 1^ with 50 

known inverse) and their uncertainties, various hypotheses x-usv r +E 
x„Hyp and their impact measures, etc. When occlusions 

between segments (objects, holons) occurs, this should be where matrices U, S and V are computed by singular value 

corrected for in the motion estimation. decomposition of X, and the residual matrix E contains the 

The estimation should yield a simplified statistical 55 non-significant dimensions of X (as judged e.g. by cross 

description of how the probability density function varies validation over pixels), 

with the motion field x„. Then the new loadings axe 

Typically, the output should contain a point estimate x„ 
with values for each coordinate involved (e.g. vertical, 

horizontal, depth). However, it could possibly have more 60 am j the updated scores are estimated from: 
than one such point estimates. 

The point estimate(s) x„ should have uncertainty 'stan- t Tm 0 i 

dard deviation* estimates. This may be based on statistical T »*» = 0 j W 
reliability information (estimation precision, sensitivity to 

noise in the input data) as well as validity information 65 

(indicating if the obtained motion estimate seems applicable The estimation process for P (and T, implicitly or 

or not). explicitly) in its basic form has as goal to describe as much 
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variance/covariance in the (weighted) change data X as FIG. 9 outlines the data structure 670 from the EstMov 

possible (eigenvalue decomposition). But in order to save 620 operator for one frame. It consists of the horizontal 910 

computation time, this process does not have to iterate till and vertical WO motion estimates DV^ and DH^ (and 

full convergence. optionally a depth change estimate). In addition, it may 

However, this estimation process for P and T may also 5 optionally consist of statistical uncertainty information from 

take into account additional information and additional the motion estimator EstMov 620. The reliability of the 

requirements motion estimates is repesented by the sensitivity in the 

An integration of bilinear modelling and smoothness respective motion directions for intensity noise (slack). The 

assumptions is described in the Third Preferred Embodiment oi . ,he mo . ,,on ^unties is represented as the lack of 

A rule-based algorithm for a reweighting scheme that also 10 flt » n < he i«.es resulting when the motion estimate is 

i j « *• * ,l . j 7 . used for decoding I™ (or a transform of it, conf. above), 

mvolves modification of the input data to the bilinear A ..... t - \ ■ t . 4 c i 

modelling is described in the Fourw Preferred Embodiment. Vahdl, y eStUnate 15 the *™*™ X P resence of 0CC,U - 

sions. 

There can be one, two or more slack parameters in the 

End for frame n»i:nFramcs reliability information. Two slack expressions 930, 940 are 

Check convergence for the sequence 1 shown: Up- and down-slack for vertical motion estimate and 

left and right slack for the horizontal motion estimated. Each 
If the changes in motion estimates X, motion model T? T of lhese may represem estimates of how far off from given 

or lack-of-fit to I„n=l,2, ... ,N are below a certain limit, or poim estimate DV and DH the motion estimate could have 

max iterations has been reached, then end the sequence come tf the imensity of ^ were changed randomly by a 

iteration. certain noise standard deviation. Hence they can be seen as 

End of while sequence estimate not converged estimated asymetric standard deviations of the motion esti- 

This algorithm is applied for the whole frame, or, if m ^5' vj . ■ r ■ i j • . i i r c. 

* . • a a ,u .* * • i j l c The validity information includes intensity lack-of-fit 

segmentation and depth estimation is involved, repeated for „. A n<rA n _ A J c Li . ,. 7 . 

each spatial segment fholon) 25 "",960,970 for whatever color space dimensions is 

Ttia kinrtir A- n • xnr- < - a ♦ -\ .u- desired— the example gives this for R,G,B color space. 

The block diagram in FIG. 6 gives details on this iterative T iL j r j V j* . 

balancing between motion estimation and hypothesis esti- In ,he second e fodiment uses what- 

mation for an individual frame. The balancing operator 600 evel b f n ™ SM ' ucncc model >? f ™° n 1S "variable fo ' a 

, , . t . , e c i An jx iu given frame at a given stage of the sequence encoding, for 

takes as input the intensity of a frame, L 610 and the ? , 4 . , ■ 

i_i rj c ■ t i r u- u *u iteratively to enhance the motion estimation for that frame, 

available Reference image model 630 from which the 30 „ , ■ , , ^ . 1 i i • r 

c . , , . 7. tJ rp,. But care is taken so that only model information with low 

motion fields arc to be estimated. This model includes the , . , . . 

r - * t ii I , apparent uncertainty is used in this enhancement, in order to 

reference image intensity l K as well as whatever subspace . . . . , . . . ... . 

i j ■ i j n{ An j ,i c j * ■ * j avoid that estimation errors that invariably will be present in 

loadings or loads P 640 and other frames estimated scores . , • «,.{. 

T 660, and their associated uncertainty statistics that are the blhnear / f '• 7 * f 

available. It delivers motion estimates for this frame at 670, 35 ?* P r0CeSS when ™ As * 15 onlv bas ? d ° D few . P revio f 

j u *u f *u- e * /on n j * J frames, are propagated at the expense of information in the 

and hypotheses for this frame at 680 as well as an updated . I , j r« «_ . * , • ^ 

• r ,u n *■ 1 w j 1 given frame s data. The bilinear sequence model lnforma- 

version of the Reference Image Model 630. ?-.u a . a c *u >• * * * u 

rn ri .rt . £ - n r ... „ , tion is then updated after the motion estimation for the 

The EstHyp operator 650 initially generates hypotheses ~_ J . . - it _ , . . . JjC 

e *u _ *■ 4- u * 1 * 1 *• / frame. The updated version of the model is in turn used for 

for the motion estimation, e.g. by temporal extrapolation/ L iL - . . , , r 

. . , f u c *u * enhancmg the motion estimation or. the next frame, etc. This 

interpolation of results from other frames. 40 ^ c . , 

The EstMov operator 620 estimates the motion field x„ process 18 P** 0 ™* once ^ the f«a of frames m the 

from Reference image I* to I„ using whatever hypotheses s ^ Mnce or re P eated several timcs for the Wncc 

x„Hyp available. THIRD PREFERRED EMBODIMENT 

As long as the i.eratioc , for this frame has not converged, Enhanced BiUnear Modelli Took 

the EstModel module 690 estimates new scores by model- 45 

ling the obtained data x„ in terms of loadings P. When the ^ third P referred embodiment prepresents an enhance- 

iteration for this frame has converged or otherwise is ment of the first or second preferred embodiments, in that it 

stopped, EstModel 690 also updates the loadings P. utilizes a temporal smoothing as part of the linear estimation 

During the iterative process, the EstHyp operator 650 of and spatiotemporal smoothing in the bilinear 

generates new hypotheses for the repeated motion 50 estimation of loading and scores. In addition, it aUows 

estimation, e.g. fitting the preliminary motion estimate x„ to adaptive statistical weighting of the input data so as to 

the available subspace loading P to estimate scores t„ and enhance the statistical properties of the first few 

generating one hypothesis this way. In tne above linear score estimation and bilinear model 

In addition, EstHyp 650 may refine the initial forms estimation each frame generates one line in matrix X, and 

forecasting hypothesis by refined time series modelling in 55 lnere is no concept of temporal continuity in the estimation 

score T space. Other hypotheses bases on smoothness etc. of tQ e scores. 

(as described above) may also be formed in EstHyp 650. The Conversely, in the bilinear model estimation, each pixel 

result in hypotheses x„Hyp are passed back to EstMov 620 generates one variable (one column in matrix X). Once the 

for renewed motion estimation. variables are defined, there is no concept of spatial neigh- 

FIG. 8 outlines the preferred data structure of the output 60 bourhood between the variables— each pixel is treated with- 

680 from the EstHyp 650 for one frame. It includes the point out any regard for where they belong in the actual Reference 

estimates, consisting of vertical 810 and horizontal 820 image. 

motion estimate DVHyp and DHHyp and optionally also In the present invention, spatial and temporal restrictions 

motion in the depth direction. For each of these directions may be included into the linear and bilinear estimation of 

the distribution information for this hypothesis includes a 65 these model parameters. The Third Preferred Embodiment 

Hypothesis Shift Range 830 and Propagation Range 840, as builds these restrictions directly into the parameter estima- 

well as a general Hypothesis Strength 850. tions: 
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Temporal Smoothing of Scores for Fixed Loadings 

In the definition of the forecasting hypotheses in Est- 
HypSeq 550 (FIG. 5) and EstHyp 650 (FIG. 6), the scores 
are forecasted by time series modelling, e.g. an ARMA 
model, and with suitable conservative parameters in the time 
series model this ensures temporal smoothness. 

In contrary, the hypotheses based on fitting data x M for 
frame n to existing loadings P by e.g. equation (3) makes no 
assumptions about temporal smoothness. In the third pre- 
ferred embodiment such temporal smoothness is obtained by 
the following operation: 

Estimate temporal extrapolation/interpolation i n Uyp 1 and 
its covariance, Cov(t„Hypj) as described above. Estimate 
also preliminary temporary scores for the present frame, 15 
t «Hyp Prtfm/£>n , and its uncertainty covariance, Cov 
(inHyVrremtim), as described above. 

Modify t^Hypp^^ towards t„Hypj according to the 
probability that t^Hypp^^ statistically could have had the 
value ^Hypj, as judged from their covariances, e.g. by: 

f-Hyp^-HyPi-^+^Hypp^^i-wj 

where the weight w n is at its maximum, e.g. 0.5, when the 
two hypotheses for the two scores are not significantly 
different, and approachs 0 the more significantly they are 
different: 

w^.S^piobabilityfXHypp,,,** appear to be equal l^HypV) 

or more formally: 

w>„=<X5*(l -probability or rejecting the hypothesis QtJdyp Mim i 
not equal to ^Hyp,')) 



Initialization: 



f«o 

E = v fi; 



•X-Vp, 



Factor number 

Residual • Dually weighted initial matrix, 



20 



25 



30 



where 

v /m««" we ig nt matrix for frames (lines in X), e.g. diago- 
nal and inversely proportional to the uncertainty stan- 
dard deviation of each frame. These weights may be a 
priori estimated on the basis of external information 
(e.g. slack estimation, as described above). Additional 
robustness is attained by recomputing the uncertainty 
variances on the basis of residuals from previous itera- 
tions. 

Vrf,-weight matrix for pels (lines in X), e.g. diagonal and 
inversely proportional to the uncertainty standard 
deviation of each pixel. These may also be given a 
priori and further refined by reweighting. 

Bilinear modelling: 

While not enough factors a: 

f-f+1 

w r -some start loading vector, taken e.g. as the line in E 

with highest sum of squares 
While not converged: 

P r =smoothed version of loading w r , to favour spatial 
continuity; 

1. Estimate uncertainty variance of w, e.g.: 



where s^ 2 -estimated uncertainty variance of data X. 
Additional variance due to uncertainty in the scores t may 



also be added. 
2. Smooth loading w, e.g. 
filtering, e.g.: 



The probability is estimated in a conventional a signifi- 
cance test. 

In this way estimation errors due to uncertainty in the data 35 
x„ and bilinear loadings avalable, P, is balanced against 
uncertainty in the temporal forecast. 

Spatio-Temporal Smoothing of Loadings and Scores in 
the Bilinear Modelling 

In order to facilitate the spatio-temporal smoothing in the 40 
bilinear modelling, a special version of the algorithm for 
principal component analysis by the power method is 
employed. 

The power method for extraction of individual factors is 
in some literature termed the 'NIPALS method* (see 
Martens, H. and Naes, T. (1989) Multivariate Calibration. J. 
Wiley & Sons Ltd, Chichester UK). 

To estimate a new factor a from data X in the NIPALS 
principal component algorithm, the effects of previous fac- 
tors 1,2, . . . ,f-l are first subtracted from the data matrix. 
Then, in an iterative process the t scores for the new factor 
are estimated by projection of the each row in residual 
matrix X on preliminary values of its loadings p. Then the 
loadings p are conversely estimated by projection each 
column of residual matrix X on the obtained preliminary 55 
scores t. A factor scaling is performed, and the process is 
repeated until the desired convergence is reached. 

This is normally done for each individual factor f-1,2, . 
. . , but it can also be done for several factors at one time, 
provided that the factors are orthogonalized to ensure full 60 
subspace rank of the solution. 

In the present invention a smoothing step (followed by 
reorthogonalizabon) is be included both for the spatial 
loadings as well as for the temporal scores. 

The preferred embodiment of the doubly smoothed 65 estimation based on the input data X, 
NIPALS algorithm for modified principal component analy- 

S1S is ; Scale p T so that p T p=\ 



by low pass convolusion 



45 



50 



where S^ is a Low Pass smooting matrix. 
A more advanced smoothing takes tentative segmentation 
information as additional input, and avoids smoothing 
across the tentative segment borders. 
3. Combine the unsmoothed and the smoothed loading: 

where v f is a weight One embodiment is to define an 
individual weight for each pixel, vf so that it is at its 
maximum, e.g. 1.0, when the pixel's smoothed loading 
w smoothed j>ei is not significanty different from its 
unsmoothed loading w^. The weight approaches 0 the 
more significantly w^^^, is different from w^,: 

v //W =l"(:i -probability of rejecting the hypothesis (w MmooMtpei is 
not equal to w^' )) 

The probability is estimated in a conventional a standard 
significance test of (w^-w^^^) vs. the pixel's esti- 
mated uncertainty standard deviation, s^,. 

Thus, in this implementation, the smoothing is only 
applied to the extent it does not statistically violate the 
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Compute preliminary score estimates: 
K»E*p 

t=smoothed version of score vector u, to favour temporal 
continuity. The smoothing in the present embodiment is 
done in analogy to the one for the loading: it is only 
applied to the extent that it docs not statistically violate 
the estimation based on the input data X. 

w^ot^E (preliminary loading estimates) 

Check convergence w.r.t change in t since last iteration 

end while not converged 

q=(P r *P)" ljPr *w (Project w on previous loadings to 

ensure orthogonal loading set) 
Orthogonalize this factor loading on previous factors: 

p=w-?~q 

Scale P r to a constant length so that p r *p=l 
Include p in P 

Estimate scores for this factor loading: 

t=u or optionally a smoothed version of u. 
Include t in T 

Subtract the effect of this factor: 

Check if there are enough factors, e.g. by cross validation. 

end while not enough factors 

Deweighting: 

Unweighted scores«V /rames _1,,, T 



26 



Unweighted loadings-P^V 
Unweighted residuals=V- 



pels 



pels 



20 



25 



35 



Robust Statistical Version of this Smoothed 
Bilinear Modelling is Attained by the Following 
Reweighting Scheme: 

Like the linear regression estimation of scores from 
known loadings, the bilinear estimation of both loadings and 
scores may be implemented in a robust way: 

New weights V /r?m „, V pels may now be calculated from 
the unweighted residuals, after suitable correction for the 
parameters estimated, e.g. after leverage-correction (see 
Martens, H. and Naes, T. (1989) Multivariate Calibration, 
J.Wiley & Sons Ltd, Chichester UK), and the bilinear 
analysis may be repeated. Particular requirements may be 
induded, e.g. that frames that appear to have large, but 
unique variance, i.e. strong variation pattern not shared by 
any other frames in in X, may be down-weighted in order to 
ensure that the first few factors bring out the statistically 
most suitable or reliable factors. 

Pyramidal Modelling 

The bilinear modelling in motion estimation may be used 
pyramidally in space and time. One embodiment of spatial 
pyramidal operation is to perform this motion estimation, 
bilinear modelling and spatial segmentation on frames in 
lower resolution, in order to identify the major holons in the 
sequence, and then to use the scores and the spatial param- 
eters (after suitable expansion and scailing) as preliminary, 
tentative input hypotheses to the same process at higher 
frame resolution. One embodiment of temporal pyramidal 



40 
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60 



operation is to perform motion estimation, bilinear model- 
ling and spatial segmentation first on a subset of frames, and 
use interpolated scores as generate tentative input hypoth- 
eses for the other frames. 

Multi-Holon Modelling 

In the preferred embodiments, the motion estimation and 
bilinear modelling may be performed on individual, already 
identified holons ('input holons'), or on complete, unseg- 
mented images I n . In either case, a multi-holon post pro- 
cessing of the obtained motion fields, bilinear models and 
segments is desired in order to resolve overlap between 
input holons. 

One such post processing is based on having stored each 
holon with a 'halo* of neighbour pixels with uncertain holon 
membership, — i.e. that only tentatively can be ascribed to a 
holon (and thus is also temporarily stored in other holons or 
as separate lists of unclear pixels). In the motion estimation, 
such tentative halo pixels are treated specially, e.g. by being 
fitted for all relevant holons, and their memberships to the 
different holons updated according to the success of the 
motion estimates. Such halo holons are given very low 
weight or fitted passively (e.g. by Principal Component 
Regression, see Martens, H. and Naes, T. (1989) Multiva 
rate Calibration. J.Wiley & Sons Ltd, Chichester UK.) in the 
bilinear modelling. 

Extra Variables 

Additional columns in the data matrix X may be formed 
from 'external scores' from other blocks of data. Sources of 
such 'external scores' are: 

scores from bilinear modelling of some other data 
domain, 

(e.g. motion compensated intensity residuals of the 
same holon), or 

scores from the same holon at a different spatial 

resolution, 
scores from other holons, or 
scores from external data such as sound 

(e.g. after bilinear modelling of sound vibration energy 
spectra of these same frames). 
The weights for such additional variables must be adapted 
so that their uncertainty level become similar to those of the 
weighted pixels in the final data matrix to be modelled, X. 

Hierarchical Bilinear Modelling of Motion Data 

An alternative way to incorporate uncertain pixels or 
extenal scores gently, without forcing their information into 
the bilinear model, is to replace the one-block bilinear 
modelling with a two- or more -block modelling, such as 
PLS regression (Martens, H. and Naes, T. (1989) Multivari- 
ate Calibration. J. Wiley & Sons Ltd, Chichester UK.) or 
Consensus PCA/PLS (Geladi, P., Martens, H., Martens, M., 
Kalvenes, S. and Esbensen, K. (1988) Multivariate compari- 
son of laboratory measurements. Proceedings, Symposium 
in Applied Statistics, Copenhagen Jan. 25-27, 1998, Uni-C, 
Copenhagen Danmark, pp 15-30. In this way, the uncertain 
pixels and external scores contribute positively to the mod- 
elling if they fit well, but do not affect the modelling strongly 
in a negative way if they do not fit. In any way these 
uncertain pixels and external scores are fitted to the obtained 
bilinear model. 

The scores from the present holon's modelling in the 
present resolution and in the present domain may in turn be 
used as 'external scores' for other holons or at other reso- 
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lutions or in other domains, as shown in the Consensus used to ensure maximum inter-frame coordination as motion 

PCA/PLS algorithms (Geladi, P., Martens, H., Martens, M., data for more and more frames become available: The 

Kalvenes, S. and Esbensen, K. (1988) Multivariate compari- weight and/or the value of individual point estimates x^., 

son of laboratory measurements. Proceedings, Symposium with particular uncertainty or particular ambiguity are modi- 

in Applied Statistics, Copenhagen Jan. 25-27, 1998, Uni-C, 5 fled. 
Copenhagen Danmark, pp 15-30.) 

Such hierarchical multiblock modelling may also be used Weighing of Individual Data Elements 

for data from other domains, such as motion compensated One way to alter the impact of individual data elements is 

intensity change data. t 0 ascribe special weights to them in the linear regressions 

FOURTH PREFERRED EMBODIMENT " 10 ^T* ^ ° r ^f** Way daU 

assessed to be particularly unreliable are given weights 

Individual Weighting and Delayed Point Estimation lower ,han ,hal expected from the product of row weights 

of Data Elements an£ ' col 1 ™ 11 weights, i.e. they are treated more or less as 

missing values. Conversely, data elements judged to be 
In tbe linear and bilinear modelling stages described in the 15 particularly informative may be given higher weights, 
three first preferred embodiments the motion estimation data For the regressioD of a frame . s motion fleld on known 
XHX. , n-lA • - • ] were taken for granted as input to the ^ to ^ frame>s ^ and for ^ _ 
statistical parameter estimat.on Statistical optimization or sjon of , ^ motjons 0Q estimate , he 
robustificauon against errors m X were attained in the Third loadi ^ workg we „ Fof sin ^ e . factor 
Preferred Embodiment by a) including additional restnc- 20 modeUin i( can also WQrk weU 
tions (spatotemporal smoothing) and/or b) including weight- . . 
ing and reweighing for the rows and columns of X. But the However, such internal detailed weighting violates the 
data elements in X were not weighted individually. Nor were ^metric assumptions behind the known bilinear estima- 
te the actual values in X themselves affected during the * 10n algorithms. Therefore, when more than one factor is to 
modelling process 25 expected in X, it may lead to convergence problems in the 
* t , bilinear modelling, and to unexpected and undesired param- 
In some cases there is a need to alter the impact of etef vanies 

individual data elements in X for frame n, pixel pel: x nMt . n , , . , „ 

For instance may some data elements be known or believed Several alteraative wavs t0 reduce me detrimental effect 

a priori to be particularly uncertain, either due to occlusions, °/ outhers u and missm g values ma V be u L sed ^tead of the 

or because they give rise to very large individual outlier 30 down-weighing method above, as described e.g. in Nonlin- 

residuals in E, in preliminary linear or bilinear ™ r M ^ V ™ ai ? Analysis, Albert Gifi (1990), J. Wiley & 

modelling, or because they display abnormally high indi- Sons Ltd ' Chichester UK. 

vidual intensity errors upon decoding. An alternative version of the fourth preferred embodiment 

Trie fourth preferred embodiment can then either apply „ modifies the aclual values themselves, instead of just the 

individual down-weighting, rule-based modification of the 35 statistical weights, of individual data elements x„^, in input 

data values, or combinations of these for such particulary malrix X ( the estimatimated motions). 

questionable data elements in X. Collectively, these tech- * n_ x/i trj-j ^ * 

. . . , i c i- * Modification of the Value of Individual Data 

mques are here termed 'Opbmal Scaling . Elements 

More generally speaking, the fourth preferred embodi- 40 

ment can be used in conjunction with the three previous When the uncertainty range can be estimated, the fourth 

preferred embodiments, and makes them more compatible preferred embodiment also modifies the values of individual 

with the over-all goals of the invention: The improved data elements so that they correspond better to the values 

motion estimation and the improve motion modelling by the from other pixels and other frames, as judged from linear or 

coordination of motion estimation for several frames via 45 bilinear modelling. An important feature of the rules gov- 

bilinear models. era ing this modification is that the data are only allowed to 

Motion estimation is usually an underdetermined process. be chan g ed wit ™° their own uncertainty range. Thereby the 

Therefore motion ambiguities will unavoidably result in information content of the input data is not violated, yet an 

estimation errors in the point estimates (estimated values) improved inter-frame and inter-pixel coordination is 

for motion estimates da^ early on in an estimation process, 50 attamed - 

These errors will only manifest themselves later in the The higher the uncertainty of an input point estimate x rKpcl 

sequence, and by then it may be too late: The early errors is deemed to be, the more is its value allowed to be 

have already been brought into the bilinear model, which influenced from the information in other, more certain 

later has been used in order to minimize the motion ambi- points. The influence comes via linear or bilinear model 

guity in subsequent frames. Therefore these early errors may 55 reconstructions. 

be propagated in an undesired way and be an unnecessary As described in FIG. 8, the uncertainty range of the data 
hindrance to effective inter-frame coordination of motion elements is constructed from two types of measures: validity 
estimation. Typically, the number of required bilinear factors (is the obtained point estimate x nj)el relevant?) and rehab il- 
required for adequate modelling of the motion data becomes ity (how precise is the value of the point estimate x w/ ^ ?). 
too high. 6 q The validity of a pixel's estimated motion in a frame n is 
In the fourth preferred embodiment, this problem is preferably estimated from A) the size of its intensity lack- 
solved by down-weighing of individual uncertain data, of-fit error upon decoding (850, 860, 870), as well as B) an 
and/or by the technique of 'Delayed point estimation'. The assessment of the probability that it does not represented 
motion field for each frame n=l,2, . . . nFrames is estimated occluded, invisible objects (880). A pixel whose intensity in 
and stored, not only with respect to its seemingly 'best' value 65 the reference image does not correspond at all with the 
(its point estimate) x n>pel , but also with respect to other intensity of the pixel it is assumed to move to in the frame 
statististical properties. These statistical properties are then n, is considered highly invalid w.r.t its preliminary motion 
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point estimate x^^. This motion point estimate should 
therefore not be allowed to have impact on the bilinear 
modelling, and may instead be modified to adhere more 
closely to the motion patterns found on the basis of more 
valid data points. Likewise, a pixel that represents a segment 
in the reference image that appears to be hidden behind 
another segment in frame n, is also considered invalid and 
treated accordingly. 

The reliability of a pixel's estimated motion in a frame n 
is preferably estimated from: 

a) Slack estimation: estimation of how much the prelimi- 
nary motion estimate may be changed before it has 
unacceptable consequences for the decoding of the 
image (830, 840), and 

b) Lack of fit to bilinear model in earlier iterations in the 
linear or bilinear modelling. 

This handling of individual data elements may be used 
both in the linear and bilinear modelling. For example, using 
this principle, the pseude-code of the second preferred 
embodiment would be modified as follows (detailed expla- 
nations are given later): 



10 



20 



Estimate motion in sequence (600): 
While sequence estimate not covered (1000) 
For frame n = l:nFrames 

From input image data (610) and available model 

information (630), estimate motion (670) and update 
the model (630): 
Form start hypotheses of the motion field x„Hyp in 
EstHyp (650) 
While frame iterations not converged 

Estimate motion field x n for the frame in EstMov (620) 
Modify the estimated motion for frame n: (1005) 
While rule-based regression iteration not cod verged (1010) 
Determine uncertainty of pixels in x„ based on validity 

and reliability estimation (1020) 
Determine regression weights for pixels based on 

uncertainty of (1030) 
Estimate scores t„ by weighted regression of Xj, on 

loadings P T (1040) 
Reconstruct motion field xJUi = t^P' (1050) 
Modify values x n =f( x^Hat, uncertainty of Xq) (1060) 
Check convergence of rule-based regression iteration: Is 
t, stabile? (1070) 
End While rule-based regression iteration 
Form hypotheses of the motion field XjjHyp in EstHyp (650) 

Check convergence for the iterations for this frame 
End while frame iterations not converged. 
Estimate the bilinear motion model subspace (630) in 

EstModel (690): 
Modify the estimated motions for many frames: (1100) 
While rule-based bilinear x- modelling iteration has not 
converged (1110) 

Determine uncertainty of each elements Xi, pel,i-l,2,...,n 
in x (1120) 

Determine least squares wieght for frames, pels and 

individual data elements in x (1130) 
Estimate scores T and loadings P (inch rank) from 

weighted bilinear modelling of x (1140) 
Reconstruct motion field matrix XHat=T*P T (1150) 
Modify values X=f(XHat, uncertainty of X) (1160) 
Check convergence for the rule-based bilinear x- 
modelling: Is T stabile? (1170) 
End while rule-based bilinear modelling iteration 
End for frame n - l:nFrames 

Check convergence for the sequence 
End of while sequence estimate not converged 

An implementation of slack information structure was 
illustrated in FIG. 8. Slack may be assessed in various 
directions; in the following example it has been assessed 
horizontally and vertically. 

FIG. 9 illustrates one simple, but effective use of slack 
information for four pixels: The pixel points a 905, b 925, c 



945 and d 965 represent the position of the pixels after 
having been moved with the preliminary point estimates x a , 
Xfc, x c , respectively. The rectangles 910, 930, 950 and 970 
around the pixels represent the areas within which the 
5 motions x at x b , x c , x d may be changed without generating 
significant intensity errors, e.g. intensity errors relative to 
the frame to be reconstructed, I„ that could not have arisen 
randomly due to thermal noise in I„. 

FIG. 9 shows that the motion estimate for pixels a 905 has 
very asymetric uncertainty ranges, as represented by the 
rectangular slack range 910: While motions further upwards 
or further to the left would give bad fit for this pel, motions 
may be modified downwards and especially far to the right 
without causing bad intensity fit. Such effect could arise e.g. 
when the frame to be reconstructed, I„ has a steep intensity 
gradient just above and to the left of position a, while being 
very flat below and to the right of position a. Therefore e.g. 
the preliminary horizontal motion point estimate dh„ a may 
be altered to the right, but not to the left, and preliminary 
vertical motion point estimate dv n a may be altered down- 
wards but not upwards in the figure. Accordingly, the motion 
estimate for pixel a 905 might have been changed to point 
915 without causing significant intensity errors. Pixel b 
likewise has large and assymetric uncertainty range. Still, 
the motion estimate for pixel b 925 cannot be changed to 
point 935 without violating the estimated motion informa- 
tion for this pixel. 

The small rectangle 950 around pixel c 945 shows that for 
this pixel the preliminary motion point estimate cannot be 
changed much in any direction before an unacceptably high 
intensity lack-of-fit would be found. This could be the case 
because the intensity of the frame to be constructed, I M , has 
steep gradients in all directions from point b. Still, the 
motion estimate for pixel b 945 may be changed to point 955 
without causing significant intensity errors. Pixel d likewise 
has narrow uncertainty range. Its motion cannot be changed 
from its estimate 965 to point 975 without violating the 
estimated motion information 970 for this pixel d. 

This uncertainty range information may be used for 
delayed point estimation — i.e. for changing the values of 
preliminary point estimates x„ to ensure increased compa- 
rability of motion data for several frames within the ambi- 
guity of the individual motion estimates. 

The rule based 'Optimal Scaling' technique can be 
applied at different stages during the motion estimation to 
optimize the compatibility: 1) within the motion estimation 
for a frame (steps under 1000), and 2) within the remodel- 
ling of the sequence motion model (steps under 1100). 



25 



35 



40 



45 



50 
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Modifying the estimated motion for frame n (1005): 

In case 1) x„ is regressed on the subspace loading matrix 
P spanning the apparent systematic variations of other 
frames. The projection of x„ on P (step 1040) for this frame 
results in certain factor scores t„. These in turn generate the 
bilinear reconstruction x„Hat=t /1 *P r (step 1050), used itera- 
tively (step 1060) as input to a renenewed motion estimation 
for this frame. In FIG. 9, if a pixeFs bilinear recontruction 
value in x„Hat falls inside its acceptable range (for example 
at points 915 and 955), the hypothesis value can be regarded 
as being as good as the original x„, value for this pixel, and 
are therefore inserted into x n . 

On the other hand, if the bilinear reconstruction value 
x M Hat falls outside the acceptable range around the motion 
estimation value, then the bilinear reconstruction value 
cannot be used. This is illustrated by points 935 and 975. In 
such cases one may then either keep the motion estimates in 
x„ unmodified (here: 925 and 965) as the best estimates, or 
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replace the elements in x„ by the value that is closest to the a three-way or higher-way linear system by assuming a 

bilinear reconstructions but lying inside the acceptable range linear time series model for the scores or a linear spatial 

(938 and 978). In some cases the motion estimate for a pixel forecasting model for the loadings, or a linear factor analytic 

in frame n (e.g. 905, 925, 945 or 965) is expected to be model for the color channels. This can give improved 

particularly uncertain e.g. because of a validity problem: it 5 motion stabilization as well as improved over- all compres- 

seems to reflect an object rendered invisible by occlusion in sion. Alternatively, bilinear methods that seek to combine 

frame n. In such cases the modified value of x^^, may be bilinear structures from more holons, more image resolu- 

allowed to be closer to the bilinaer reconstruction x njjeI Hzl, tions etc. may also be used. The Consensus PCA NPLS 

even though this reconstruction falls outside the apparent (Geladi, P., Martens, H., Martens, M., Kalvenes, S. and 

reliability range (e.g. changing pixel d from value 965 to a io Esbensen, JC (1988) Multivariate comparison of laboratory 

value close to 975). measurements. Proceedings, Symposium in Applied 

^ . w , . Statistics, Copenhagen Jan. 25-27. 1998, Uni-C, Co pen- 
Repeated Regression on Modified Motion Vectors , ' * onx . u u ^ 
^ & hagen Danmark, pp 15-30) is one such alternative. 
Errors in the motion vector cause errors in .he scores ^ methods than ^ additive bilinear mod . 
t,, obtMned by regressing x on loadm^ P. After the above „ ^ us fof mixed adduive _ 
modification , increases the fit of x„ to subspace P, a renewed multi *, icative modeUing . 0 ne such alternative, which may 
regression (1040) may be expected to gave new scores .„ be as * si rior t0 bilinear modelling, is 
with lower errors. Thus, the score estimates !„ may now be MuIli Ucative si al Correction (MSC) and its extensions, 
refined by again projecting the modified motion vector x on ag dtscribcd in Marte H and N T . (1989) Multvariale 
the loadings P, the rule based modification of the motion data M Calibralion j wil & Sons Lld) cbich V ester (jk. 
again applied, and this iterative regression process repeated 

for as long as is desired. In each new score estimation, new ^ e ij se 0 f Pseudofactors 
weights for the pixels may be used. One implementation of 

this is to weigh down those pixels more or less inversely When good a priori loadings are known, these may be 

proportional to their distance DlST nj?el to the acceptable 25 used instead of or in addition to the loadings estimated as 

range (937, 977), e.g. weighty iPel = 1/(1 +D\SJ nj?el ). described above. In particular, the loadings corresponding to 

r. . j w . *• < i f , affine motions may be used. 
Repeated Motion Estunation with Improved 

Hypotheses FIFTH PREFERRED EMBODIMENT 

After convergence of the above regression iteration, the 

modified values of x„ are inserted into hypothesis x„Hyp 30 Combined Motion Modelling and Intensity 

(step 1050), which is then supplied for a renewed motion Modelling 

estimation for this frame (step 650), and this iterative motion ^ ^ { context) motk)n estimatl0Q 5etween two 

estimation process is repeated for as long as is deseed. frames ^ ^ same objectS) say (he reference &ame 

Tne final motion estimation x„, then represents a different 35 R and frame n> conceras comparing the intensities of two 

result than the initial motion estimate x„ for this frame, and frameS) ^ vs ^ under various assumptioils about whe re in 

the modifications give better coordination with the motion frame n lhe ob j ects from f rame R ha ve moved to. However, 

estimate information from other frames, without significant tf the an ob j ect > s intensity itself changes between frame R 

loss of intensity correctness for frame n itself. If the results and frame n> and this mtens i ty change ^ not corrected for, 

in FIG. 9 represented this final motion estimation iteration 4Q then tnese mte nsity changes may mistakenly be treated as 

for pixels a, b, c, and d, then their motion estimates (905, motions and an inefficient modelling may be the result. 

925, 945, 965) might be replaced by values (915, 938, 955, n . . . . A A \ v f • ♦ •» 

g_ 8 ! 7 & r J v Conversely, the estimation and modelling or intensity 

^ changes in the present context consists of comparing inten- 

Modify the estimated motions for many frames (step 1110): sities of the reference image with the intensity of frame n. If 

The algorithm for the fourth preferred embodiment above 45 an object in frame n has moved relative to its position in the 

shows how a similar rule-based modification of the motion reference frame, and this motion is not compensated for, it 

data can be applied during the estimation of the loading may mistakenly be treated as intensity change, and an 

subspace R In an inner iteration for improved bilinear inefficient modelling may again be the result. 

modelling, the sequence motion data to be modelled, X, are The present embodiment employs bilinear modelling in 

modified in step 1160 according to previously estimated 50 the motion domain and/or in the intensity change domain to 

bilinear reconstructions XHat (step 1150), to ensure better minimize such mistakes. 

internal coordination within the uncertainty ranges, and the i n t he first version of the embodiment, motion estimation 

bilinear model is then updated. is improved by bilinear intensity change modelling: It 

In an outer iteration for modelling the whole sequence assumes that one has established a bilinear intensity change 

(step 1000), motion hypotheses based on bilinear motion 55 mode ] (consisting of intensity scores and loadings), e.g. 

model is used for enhancing the motion estimation for the based on prior know edge or by PCA of the intensities I„ of 

frames, and the obtained motion estimates are used for a ^ Q f f rames where the light intensity of the objects 

updating the bilinear sequence model. In conjunction with change, but the objects do not move relative to the reference 

the first preferred embodiment, this outer iteration is done i mage ^ version consists of the following steps: 

each time the whole sequence of frames has been analyzed 60 r, A . fr ^ . t . oami ^„™ 

r t i « i « r .t^i,. • • For each frame in the sequence 

for motion. In the Second Preferred Embodiment it is „ „ . , r , . , 

preferably done progressively, each time a new frame has Estimate the frame s intensity change scores 

been motion analyzed. (eg * by extrapolation/mterpolation from the intensity 

scores of other frames) 

Other Modelling Methods 65 2 . Compute the intensity change DI^ for this frame as the 

The rank-reducing bilinear modelling was above applied product of its intensity change scores and the intensity 

to the two-way frames x pels system. It may be extended into change loading matrix 
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3. Generate for this frame an intensity corrected reference 
frame as 

4. Estimate the motion field DA^ from C R to I„, e.g. by 
one of the methods described in this report. 

In the second version of the embodiment, intensity change 
estimation is improved by bilinear motion modelling: It 
assumes that one has established a bilinear motion model 
(consisting of motion scores and loadings), e.g. based on 
prior knowledge or by PCA of the motion fields DA^ of a 
set of frames where the objects move, but the light intensity 
of the objects do not change relative to the reference image. 
The second version consists of the following steps: 

For each frame in the sequence 

1. Estimate the frame's motion scores 

(e.g. by extrapolation/interpolation from the motion 
scores of other frames) 

2. Compute the motion field DA^ for this frame as the 
product of its motion scores and the loading matrix 

3. Use the motion field DA^ to generate the motion 
corrected intensity change, e.g. by moving (warping) I n 
back to the reference position: 

Jn-MoveBack(I„ using DA^) 

4. Estimate intensity change at the reference position: 

In the third version of this embodiment, the first and the 
second version are combined sequentially: It assumes that 
one has established a bilinear intensity change model 
(consisting of intensity scores and loadings), e.g. based on 
prior knowledge or by PCA of the intensities I„ of a set of 
frames where the light intensity of the objects change, but 
the objects do not move relative to the reference image. The 
third version consists of the following steps: 

1. Estimate motion fields DA^ for one or more frames 
according to the first version, using the bilinear inten- 
sity change model. 

2. Estimate or update a bilinear motion model from these 
motion fields. 

3. Estimate intensity change fields DI^ for one or more 
frames according to the second version, using the 
obtained bilinear motion model. 

In the fourth version of this embodiment, the second and 
the first version are combined sequentially: It assumes that 
one has established a bilinear motion model (consisting of 
motion scores and loadings), e.g. based on prior knowledge 
or by PCA of the motion fields DA^ of a set of frames 
where the objects move, but the light intensity of the objects 
do not change relative to the reference image. The fourth 
version consists of the following steps: 

1. Estimate intensity change fields DI^ for one or more 
frames according to the second version, using the 
bilinear motion model. 

2. Estimate or update a bilinear intensity change model 
from these intensity change fields. 

3. Estimate motion fields DA^ for one or more frames 
according to the first version, using the obtained bilin- 
ear intensity change model. 

The fifth version of this embodiment consists of iterating 
between the first and second version of the embodiment, 
with an updating of the bilinear models in between. The 
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starting step can be chosen to be version 1 or version 2. In 
this example, version 1 is the starting step. A prior bilinear 
intensity change model is then established e.g. as described 
above, and the fifth version consists of the following steps: 

1. Estimate motion fields DA^ for one or more frames 
according to the first version, using the bilinear inten- 
sity change model. 

2. Estimate or update a bilinear motion model from these 
motion fields. 

3. Estimate intensity change fields DIj^ for one or more 
frames according to the second version, using the 
bilinear motion model. 

4. Estimate or update a bilinear intensity change model 
; from these intensity change fields. 

5. Check convergence: e.g. are the motion scores stabile? 

6. Repeat steps 1-5 until convergence 

The sixth version of this embodiment is similar to the 
fifth. But bilinear models are assumed to exist both for 
20 intensity change and for motion, and their loadings are not 
updated inside this version. The sixth version consists of the 
following steps: 

1. Estimate motion fields DA^ for one or more frames 
according to the first version, using the bilinear inten- 

25 sity change model. 

2. Estimate intensity change fields DI^ for one or more 
frames according to the second version, using the 
bilinear motion model. 

30 3. Check convergence: e.g. are the motion scores stabile? 
4. Repeat steps 1-3 until convergence. 
After the first iteration in the iterative versions 5 and 6, the 
intensity change scores may be estimated by regressing the 
estimated motion compensated intensity change field DI^ 

35 on the intensity change loading matrix. Likewise, the motion 
scores may after the first iteration be estimated by relating 
the estimated motion field DA^ to the motion loading 
matrix, either by regression or by nonlinear iterative mini- 
mization. In the latter case, the criterion to be minimized 

40 may be a function of the residual intensity error after 
subtraction of the estimated effect of the bilinear intensity 
change model from the motion compensated intensity 
change field DI^. Additional constraints may be included in 
the criterion, e.g. in order to guard against meaningless 

45 solutions such as motion fields reducing the motion com- 
pensated DI^ to abnormally few pixels in the reference 
image. 

Constraints in the corrections: For optimal efficiency, this 
embodiment may be operated with certain constraints on the 

50 motion estimates and the intensity change estimates. In the 
fifth version of the embodiment these constraints may be 
increasingly relaxed as the iterations proceed. 

On one hand, the constraints on the intensity corrections 
in the motion estimation may be such that only intensity 

55 correction that does not appear to reflect unmodelled 
motion, or otherwise does not introduce artifacts in the 
motion estimation, is applied. This means that, particularly 
early in the iterative process, bilinear intensity change 
information that does not have large scores for more than 

60 one frame or a small group of adjacent frames is scaled 
towards zero, and/or the intensity corrections are smoothed 
spatially. 

On the other hand, the constraints on the motion com- 
pensations in the intensity change estimation are such that 
65 only motions that do not give unexpected folding effects are 
allowed; this means that particularly in the beginning of the 
iterative process, the motion compensation fields are 
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smoothed to avoid folding unless clear indications for valid for each object may also be used). The obtained parameters 

occlusions are established. of the bilinear models in the end provides detailed qualita- 

The methods described above for the present embodiment tive and quantitative information about the found object, 

may be applied in a pyramidal fashion. One example of this Application using affine motion estimation: Systematic 

is that the prior bilinear models have been estimated at a 5 variations in the pattern of an object to be searched for is first 

different spatial resolution and just scaled to correct for the approximated by bilinear modelling in the intensity domain, 

resolution differences. based on a set of known images of the objects. Then, in order 

The said methods, like all the other methods presented to find this object in an unknown image, this model is 
here, may be applied repeatedly for a given sequence of applied repeatedly at different positions. This allow auto- 
frames. 10 matic correction for known systematic variations without 

Simultaneous dual domain change estimation and mod- loss of too many degrees of freedom and without too much 

elling: On one hand motion estimation and bihnear multi- computational requirements, 

frame motion modelling is performed on the basis of inten- Example of e.g. a face: 

sity corrected images. On the other hand intensity change Calibration: A number of images of different faces are 

estimation and bilinear multi-frame intensity modelling is 15 rccorc icd in order to estimate a Reference model, 

performed on the basis of motion compensated image inten- Calibration motion compensation: The images may 

SlllCS * • , , • i optionally have been normalized by affine transformations 

Resiproke domain corrections based on the bilinear mod- SQ ^ tQ iyc maximal 0VCfl of ^ momh ctc Morc 

els: One one hand, the intensity correction used in the details 0Q such mQtioQ ampeasrion is given in the fifth 

motion estimation is based on the best available bilinear 20 ferred embodiracnt> as wcll as m WO95/08240 Method 

intensity model, but subjected to additional constraints. On and ms for data anal ^ which is ncrcb deluded by 

the other, the motion fields used for the address correction reference 

(motion compensation) in the intensity change estimation ~ \. . . .. , „. ™_ . . .. . 

>. if j I . .... .. Calibration intensity modelling: The intensity in 

iteration are based on the best available bilinear motion L i i o t_v • ■ i i_ i lL ■ 

. . . . .... , . • . „ black&white or in vanous color channels are then appro xi- 

model, but subject to additional constraints. 25 . , , .... . . . , .,. ™ . . . £ * 

^ . . . . #u . ~ uj.u mated by bilinear intensity modelling. This intensity Refer- 

Constraints in the corrections: On one hand, the con- *\ „. , . c ~ f , .. . . . 

..... . . , . ' .. ence modellmg may consist of first selecting one typical 

straints on the intensity corrections, to be used in the motion - tL *nr c uu • • r 

i / ... , . iace, — the Reference iace, — it could be one given image oi 

estimation, are such that only intensity correction that does * . ^ f & WJ . 

„ . . ii . J , ! one given person, or some aggregate of images of several 

not appear to reflect unmode lied motion or edge mterpola- ^ ™f 4 , ... j ™ r 

. . f j .i- . t i i * persons. Then the variations around this average Reference 

tion effects, is applied, lnis means that, particularly early in 30 . . . „ . , . . . ° . . 

...... rr .... . . .. . •/ ;. image may be modelled by principal component analysis, 

the iterative process, bilinear intensity change information . r . , . c ... . t c . . , . 

, i i /. . c r retaining only the significant intensity factors, as ludged e.g. 

that does not have large scores tor more than one frame or . i-j *■ t» li tL . c ~. 

„ rj- .r ij. j by cross validation. Presumably, the normalized faces used 

a small group of adjacent frames, is scaled towards zero, .... ... , . / ■ .i 

..^ . . . j ,. - in this calibration have been chosen sufficiently different so 

and/or these intensity corrections are smoothed spatially. On i» . L , j t j. * 

.« i j .< . ■ . .. ^ ;. as later to enable adequate predictive approximation of 

the other hand, the constraints on the motion compensations, 35 ., c c ! i *■ u 

, . j. . . . . .... r . . many other frames from the same statistical population by 

to be used in the intensity change estimation, are such that . . , ^ . - . . . . . r . ^. „ / 

. .. .... . • . , r a- . interpolation. Details of how to build such a bilinear Ref- 

only motions that do not give unexpected foldmg effects are A , • „■ ^ „ • u ^ onr n KT „„ 

„ J j _ . & ^. . , . . . . r . L erence calibration model is given e.g. in Martens & Naes 

allowed. This means that particularly in the beginning of the inor4 . , , A °.,. 4 . , . „ . , 

. r . J . ■ r- fj 1989, mentioned above. Additional artificially created 

iterative process, the motion compensation fields are , , „. r . . .... , 

* . ' , . . . ..... c fj loadings, modellicg e.g. varying light conditions, may also 

smoothed to avoid folding unless clear indications for valid 40 ,. f , . & . f . / .. , ' 

. . . ... . t be included in the set of intensity loadings. 

occlusions are established. „ . ^ ^ t , , .? - - 

Downweighting uncertain information in the modelling: , Prediction: To find the unknown position of a new face 

In the bilinear modelling, pixels and pixel regions that are ^ lhe same s ^ l f lical P°P^u, the obtained calibration 

detected to have particularly high uncertainty, e.g. due to results are u used for . simultaneous motion estimation and 

apparent occlusions or edge effects, are weighted down 45 mtensit y chan S e a*™*™- 

relative to ±e other pixels. Likewise, particularly uncertain Prediction motion estimation: The unknown image inten- 
frames are weighted down. Particularly uncertain single Sit y and tne bilinear mtensity Reference model (Reference 
observations for certain pels for certain frames are treated face and intensit y factor loadings) are repeatedly displaced 
more or less as missing values and modified within the relative t0 each other - ^ ma Y most eaSuV be attained b y 
bilinear estimation process to comply with the more certain 50 moviD g the "^own image to different positions and hold- 
observations, by the invention described in the fourth pre- m S the more complex bilinear intensity Reference model 
ferred embodiment. unmoved in reference position. 

Prediction intensity estimation: For each displacement, 

SIXTH PREFERRED EMBODIMENT the bilinear Reference model is fitted to the corresponding 

n image intensity to estimate the intensity scores, e.g. by some 

Flexible Yet Restricted Pattern Recognition fas( rC g rcssion Rcwcightcd partial l eas t squares 

Another application of bilinear intensity modelling com- regression may be used in order to reduce effects of outlier 

bined with motion estimation is intended to allow a flexible pixels due to e.g. sigarettes or other small unexpected 

pattern recognition with limited computational require- abnormalities. 

mcnts: 60 The (weighted) lack-of-fit residual between the image's 

Summary: The over- all pattern recognition goal is here to intensity and the bilinear Reference model (in one or more 

find and identify, and possibly quantify, an unknown object color channels) are computed and assessed, 

in an image, by searching for a match to one or more known Final prediction result: The displacement that gives the 

objects. The motion estimation concerns finding where in smallest weighted lack-of-fit residual variance may be taken 

the image the unknown object is. The role of the bilinear 65 as the position of the unknown face, and the corresponding 

intensity model is to allow each known object to represent intensity scores and residuals may be taken as parameters 

a whole class of related objects. (A bilinear motion model characterizing the given unknown face. 
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Combined motion and intensity modelling: In order to the scores both of the motion model and the intensity change 

allow the unknown face to have another size and inclination model, using e.g. a nonlinear iterative residual minimization 

than the ones used in the bilinear modelling (after optional (J. A. Nelder and R. Mead, * A simplex method for function 

normalization), the prediction process may be repeated with minimizabon', Computer Journal, vol. 7, p, 308-313). 
the normalized intensity model scaled and rotated by dif- 5 An unknown image may be searched using two or more 

ferent affine transformation scores, and the best over-all fit sucn mo dels (e.g. model of men's faces, model of womens' 

is search for. Thereby not only the position but also the size f aceS) mo del of children's faces), and the model that shows 

and inclination of the face is estimated. mc best fit for a certain image region is the chosen. 

Application using general motion estimation: 

Optionally, a motion estimation and accompanying 
motion modelling may be used in the calibration phase so To tDe extem mere MG iterative steps in the estimation 

that not only intensity differences and coarse, affine motions processes in the above preferred embodiments, various 

are allowed, but also other types of shape differences. This control parameters may be relaxed as a function in of 

may be done by bilinear modelling of motion fields or their iteration number and model performance. Among the 

residuals after affine transformation, resulting in motion parameters to be relaxed are: 

scores and motion loadings for various factors. Additionally, ^ x e ... - ... c 

_ , . s ... . . " 1) Smoothing parameter: Smoothing parameters for 

extra factors spanning known typical motion patterns arising < ac . t : mnti ° L„,, u ~ a „ „°iJl„^u~ ;„ 

r 4 • ' l L j c -i. motion estimation may be relaxed, e.g. as describe in Optic 

e.g. from tilting or turning the head, from smiling or ^ o- u /, fl ni\ T rnr ^ . 

. & . . * . i j _j * ■ , , ™ , , , Flow Computation , A. Smgh, (1991), IEEE Computer 

laughing, may be mcluded in the motion model: Ine load- ~ • * r»_ to i.* i * ^ %~ * i j j i_ 

■ r .V . c l • i , . , 20 Society Press, pp. 38-41, which is hereby included by 

ings of these extra factors may have been obtained cram e n * • l j 

^ „ , , • , • ■ . r l reference. Early in an estimation process, a harder smooth- 

controlled experiments involving motion estimation of the . , , . A , , . *. 

. . , F _ - - 6 .„ . ine should be done than later in the process, 

person with the Reference face, seen when tiffing or turning ° r 
or talking. 2) Pyramid impact parameter: In the case of hierarchical, 

t, „ f . • * jij- multi-resolution motion estimation, the parameter that regu- 

lne Reference model now contains two model domains: . . . e . r , , , . 

j i j • , , jit .« • • 25 lates the impact of results from one resolution level on the 

motion model and intensity change model, both pertaining , r . . ~ . . 

„ r -4* / Tu c next, may be relaxed. Early in an estimation process low- 

to one Reference position (e.g. the average face, or one , . J n . 7 , 1 1 , y . . . 

i - i c \ tm- ^ 1 ii • i j l ii_ ii_ resolution results may have higher impact than later m the 
typical lace). The motion model includes both the coarse, J ° r 

affine motion modelling and the fine bilinear motion model. P rocess - 

This dual domain restricted bilinear modelling, which 3) Intensity impact parameter: When correcting for inten- 
allows for certain shape variations and certain intensity 30 sity changes in multi-domain estimation and modelling, then 

variation, may be used in various search processes. onl y intensity changes that are consistent over several 

One such search process is to apply the model around frames ' an u d thereb y ar e relatively certain to reflect genuine 

various affine motions (translations, scaling, rotation) intensit y chaD S es and n u ot l J nn ? ode ; led motion falsely treated 
applied to the unknown image: For the affine motion per- „ as /* ten f X sh ™ld be allowed. This can partly be 

form a local motion estimation, between the moved achieved by letting intensity changes have little impact on 

unknown image and the Reference image or some transfer- ^ intensit y correctl0D earl V sta S e of an estimation process, 
mation thereof within the bilinear motion and intensity 4) Segmentation sensitivity to details: Early in an estima- 

change models. The obtained local motion field is regressed tion process, the estimated motion information and also 
on the motion loadings of the Reference model, to estimate 40 otner information, is relatively uncertain. It may therefore be 

local motion scores, to estimate the systematic fine posi- suboptimal to segment based on too small spatial details, 

tioning and reshaping of the unknown face, within the relative to their uncertainty, early in the estimation process, 

subspace of allowed fine motions. Most segmenting methods operating on still images have a 

The intensity difference between the motion compensated threshold that influences how small details will be consid- 
input image and the Reference image is projected on the 45 e 

bilinear intensity loadings to estimate the intensity scores Other A lications 

and the resulting intensity residual image and lack-of-fit ™ 

variance. As above, the affine motion with the lowest lack- The above technique for coordinating motion estimation 

of-fit variance is chosen as the final one, and the correspond- for different frames via a mathematical bilinear model is also 
ing bilinear scores for non-affine motions and intensity 50 applicable to other types of data. Examples of such data are: 
changes, as well as the resulting intensity residuals, give the 

characteristics of the individual unknown face. These data Sound 
may e.g. be used for more detailed pattern recognition 
purposes. 

In addition to only size and inclination correction, full 55 A sound frame may represent an energy vector recorded 

face shape correction may also be included. In this case a full over a fixed or varying length of time, and may be given as 

bilinear modelling of facial shape variations is included in a function of time. * Motion estimation* is this case may 

this invention: During the calibration phase, systematic detect short-term temporal shifts in the time pattern in 

shape variations for the different normalized face images, comparison to a reference sound frame, e.g. describing 
relative to the referencing image, may be detected by motion 60 velocity differences in different people's pronounciation of 

estimation and summarized by linear motion modelling. a word or a sentence. The bilinear modelling of the time 

Likewise, systematic intensity variations of the motion shifts from many repeated frames (recordings) of the same 

compensated face images are detected as difference images word or sentence serves to generate a model of the system- 

at the reference position and summarized by bilinear inten- atic timing variations involved. Bilinear modelling of 
sity modelling, as described in the previous embodiments. 65 frames' time compensated energy vectors represent addi- 

During predictive pattern recognition for a known face, the tional systematic intensity variations in the sound. The 

search process is supplemented with a process that estimates bilinear models may in turn be used for facilitating subse- 
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quent 'motion 1 estimators of short-term temporal shifts as 
described for video images. 

Vibration Frequency Spectra 

Alternatively, the sound frames may be given e.g. as 5 
frequency spectra, after a Fourier Transform or subband/ 
wavelet transform of the time frames recorded. In this case 
the 'motion estimation* may detect shifts in the frequency 
spectrum of each frame relative to a reference frequency 
spectrum, e.g. describing how the overtone series shifts 10 
systematically when a given music instrument is played at a 
different pitch. The bilinear modelling of the estimated 
frequency shifts show how the overtone series systemati- 
cally moves when the pitch is changed. The bilinear mod- 
elling of the pitch corrected intensities reveals systematic 15 
intensity changes beyond the frequency shifting. The bilin- 
ear models may in turn be used for facilitating subsequent 
'motion* estimation of frequency shifts. 

Vibration Energy Images 

To accommodate variations on different lime scales, the 
sound frames may be recorded in more than one dimension. 
A two-way example similar to video images is when each 
frame represents the frequency spectrum of the sound 
energy, recorded over e.g. a millisecond (ordinate) vs. time, 25 
e.g. for 1000 milliseconds (abscissa). Motion estimation 
relative to a reference frame allows detection of both fre- 
quency shifts and temporal delays. Subsequent bilinear 
modelling of the motion over several frames detects sys- 
tematic patterns in frequency and timing shifts. Bilinear 3Q 
modelling of the motion compensated energies detects sys- 
tematic patterns in the intensities beyond the frequency and 
timing shifts. These bilinear models may be fed back to 
enhance subsequent motion estimations. 

The bilinear model parameters involved (scores, loadings 35 
and residuals) for sound may be used for digital compression 
of audio data. They may also be used in order to give a 
compact model of the sound patterns, used e.g. for post- 
editing of the sound, in video games, etc. They may also be 
used for process control and for automatic error warnings, 40 
e.g. when the vibration data come from mechanical equip- 
ment such as different vibration sensors in a car, a ship or an 
airplane. The sound scores may be related to corresponding 
image information or bilinear image scores form approxi- , 
mately the same time frames, for further video compression, 4S 
lip synchronization, etc. The bilinear modelling of the sound 
data may be performed jointly with the bilinear modelling of 
the video data, e.g. by PLS2 regression (Martens & Naes 
1989) or Consensus PCA/PLS (Martens & Martens 1986, 
Geladi et al 1988). 50 

Other applications of combined motion estimation and 
bilinear modelling are in analytical chemistry: 

An application of the present invention is the coordinated 
estimation and modelling of systematic position changes and 
intensity changes over multiple observations in spectrom- 55 
etry. One example of this is nuclear magnetic resonance 
(NMR) spectroscopy and consists of estimating and mod- 
elling the so-called 'chemical shifts* (corresponding to 
'motion* in the previous video coding explanation) and 
concentration -controlled changes in peak heights ('intensity 60 
changes*) of various types of molecular functions (possible 
'holons'), recorded e.g. at different frequencies ('pixels') in 
a set of different but related chemical samples ('sequence of 
frames)). Electron spin resonance (ESR) spectroscopy can 
be analyzed similarly. 65 

Another type of chemical application is spectrophotom- 
etry of various sorts (e.g. transmission, reflectance, 



flourescence, Raman) in various electromagnetic wave- 
length ranges, e.g. from X-ray to radio frequency. For 
instance, in the ultraviolet/visible/infrared range, the appli- 
cation of the present invention could correspond to detecting 
solvent induced wavelength shifts ('motion') and 
concentration -controlled absorbance changes ('intensity 
change') of various types of molecules or molecular groups 
(possible 'holons'), recorded at different wavelengths, 
wavenumbers or time-of-flights ('pixels*) in a set of differ- 
ent but related chemical samples ('sequence of frames'). 

Yet a class of applications of the present invention con- 
cerns physical separation techniques such as chromatogra- 
phy and electrophoresis and flow injection analysis. For 
instance, in high pressure liquid chromatography separation 
of chemical compounds, the application of the present 
invention could correspond to detecting retention time 
changes ('motion' induced by changes in the stationary 
phase of the column) and concentration controlled detector 
signal changes ('intensity changes') of various chemical 
compounds (possible 'holons'), recorded at different chro- 
matographic retention times ('pixels') in a set of different 
but related chemical samples ('sequence of frames'). 

In such quantitative analysis applications the way to 
combine holons is generally simpler than in video coding, 
since the effects of overlapping holons usualizu can be added 
together without any regard for occlusion. Therefore the 
need for segmentation is less than in video coding. 

Examples of other application are: 

2D multi-channel color video images, ultrasound images, 
or satellite images, or radar image data, 2- or 3D images 
from computer tomography, or Magnetic Resonance 
Imaging, ID line camera data. 

While the invention has been particularly shown and 
described with reference to the preferred embodiments 
thereof, it will be understood by those skilled in the art that 
various changes in form and detail may be made therein 
without departing form the spirit and scope of the invention. 
Particularly, the term "plurality" can be interpreted in the 
sense of "one or more". 

What is claimed is: 

1. A method for estimating motion between one reference 
image and one or more frames in a sequence of two or more 
frames, each frame consisting of a plurality of pixels of an 
input image, comprising the steps of: 
estimating motion from the reference image to two or 

more of the frames to produce motion fields; 
transforming the motion fields into a motion matrix, 
where each row of the motion matrix corresponds to 
one motion field and each row of the motion matrix 
contains a vertical and a horizontal component of a 
motion vector pertaining to each pixel of the reference 
image; and 

performing Principal Component Analysis on the motion 
matrix to produce a factorization of the motion matrix 
into the product of a motion score matrix and a motion 
loading matrix, the motion score matrix comprising a 
plurality of motion score vectors arranged as column 
vectors, the motion loading matrix comprising a plu- 
rality of motion loading vectors arranged as row 
vectors, the factorization forming a bilinear model in 
which one column of said motion score matrix and one 
corresponding row of said motion loading matrix 
together comprise a factor of the bilinear model, the 
number of factors being lower than or equal to the 
number of said frames; 

wherein results from the Principal Component Analysis 
on the motion matrix are used to influence further 
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estimation of motion from the reference image to one 
or more of the frames. 

2. The method according to claim 1, comprising the steps 
of: 

(1) estimating a particular motion field for each frame 5 
from the reference image to the respective frame 

(2) transforming the estimated motion field for each frame 
into a motion matrix; 

(3) performing Principle Component Analysis on the 
motion matrix associated with each frame; 10 

(4) for each frame, multiplying the entries of the motion 
score matrix corresponding to the frame by the corre- 
sponding motion loading vectors to produce a motion 
hypothesis for each frame; 

(5) for each frame, estimating a motion field from the 15 
reference image to said frame, using the motion 
hypothesis as side information; and 

(6) outputting the motion fields estimated in step (5), the 
output motion fields representing the motion between 
said reference image and each respective frame in the 20 
sequence of frames. 

3. The method according to claim 2, wherein steps (2) to 
(5) are repeated for a plurality of passes through said 
sequence of frames. 

4. The method according to claim 3, wherein said motion 25 
estimation is performed with reference to a smoothness 
parameter which is a decreasing function of pass number, a 
smoother motion field being produced for a higher value of 
said smoothness parameter. 

5. The method according to claim 3, wherein said motion 30 
hypothesis is formed with reference to a hypothesis impact 
parameter which is an increasing function of pass number, a 
larger hypothesis impact parameter producing a motion 
hypothesis which has a greater impact on said motion 
estimation. 35 

6. The method according to claim 2, further comprising 
the steps of: 

(5b) re-estimating the bilinear model based on the motion 
field found in step (5); 4Q 

(5c) multiplying the motion score matrix for the given 
frame by the motion loading vectors from the 
re-estimated bilinear model to produce a second motion 
hypothesis; and 

(5d) estimating a particular motion field from the refer- 45 
ence image to the frame, using said second motion 
hypothesis as side information, 
wherein the motion field estimated in step (5d) represents 
the motion between the reference image and the frames. 

7. The method according to claim 2, wherein during the 50 
Principal Component Analysis of step (3) and the formation 

of the motion hypotheses in step (4) uncertainly estimates 
for the motion hypothesis are generated and subsequently 
used to control the degree of impact of the motion hypoth- 
esis as side information in the motion estimation in step (5). 55 

8. The method of claim 2, wherein the collection of 
motion score vectors and motion loading vectors estimated 
in step (3) represents the motion fields. 

9. The method according to claim 1, the method com- 
prising the steps of: 60 

(1) estimating motion from the reference image to the first 
of said frames; 

(2) forming the motion matrix from the estimated motion 
between the reference image and the first of said 
frames; 65 

(3) performing Principal Component Analysis of the 
formed motion matrix; 
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(4) for a next one of said frames, predicting a value for the 
motion score vectors using extrapolation from previous 
entries of the motion score matrix and multiplying the 
predicted entries of motion score vectors with the 
motion loading vectors to produce a motion hypothesis 
for each frame; 

(5) for said next frame, estimating a particular motion 
field from the reference image to the frame, using the 
motion hypothesis as side information; and 

(6) repeating steps (2) to (5) for respective subsequent 
next frames until no more frames remain in the 
sequence, 

wherein the motion fields estimated in step (5) represent the 
motion between the reference image and the frames. 

10. The method according to claim 1, comprising the steps 

of: 

(1) estimating motion from the reference image to the first 
frame in said sequence; 

(2) forming a motion row vector containing each compo- 
nent of the motion vector for each element of the 
reference image and with one row for each frame; 

(3) updating the bilinear model based on the appended 
new motion row vector; 

(4) for a next of said frames, predicting a value for the 
motion score matrix using extrapolation from previous 
entries of the motion score matrix and multiplying the 
predicted entries of the motion score matrix by the 
motion loading vectors to produce a motion hypothesis 
for each frame; 

(5) for the next frame, estimating a motion field from the 
reference image to the frame, using the motion hypoth- 
esis as side information; and 

(6) repeating steps (2) to (5) for respective subsequent 
next frames until the last frame in said sequence has 
been processed; 

wherein the motion field estimated in step (5) represents the 
motion between the reference image and the frames. 

11. The method of claim 10, wherein steps (2) to (6) are 
repeated for a plurality of passes through said sequence. 

12. The method of claim 11, wherein steps (2) to (5d) are 
repeated for a plurality of passes through said sequence. 

13. The method of claim 1, further comprising the step of 
using an intermediate bilinear model of the motion matrix as 
side information for motion analysis in higher spatial 
resolution, the intermediate bilinear model having motion 
loading vectors of reduced spatial resolution relative to the 
motion loading vectors of the bilinear model. 

14. The method according to claim 13, wherein a degree 
of use of said bilinear model is indicated by a pyramid 
impact parameter that is a decreasing function of pass 
number, a large value for said pyramid impact parameter 
resulting in a strong influence of said side information. 

15. The method according to claim 1, wherein a segment 
field is used to select a part of said reference image and 
motion estimation is performed only for the selected part of 
said reference image. 

16. The method of estimating motion according to claim 
1, wherein the reference image is part of the sequence of 
frames, and further comprising the step of: 

segmenting said reference image based on the estimated 
motion to produce a plurality of segment fields. 

17. The method of claim 16, further comprising the steps 

of: 

estimating motion for each segment field; and 
repeating the segmenting and estimating motion for the 
segment field steps for a plurality of passes, the col- 
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lection of estimated motion fields in the last of said 
passes representing segmentwise motion. 

18. The method according to claim 17, wherein segment- 
ing of said reference image is performed according to a 
segment detail parameter which is an increasing function of 
pass number, a higher value of said segment detail parameter 
producing a more detailed segmenting. 

19. The method of claim 1, further comprising the step of 
performing an Optimal Scaling on the bilinear model. 

20. The method of claim 1, wherein the step of performing 
Principal Component Analysis includes reweighting. 

21. The method of claim 1, wherein the step of performing 
Principal Component Analysis includes the step of handling 
missing values in the input data which correspond to areas 
in said reference frame where said motion estimation was 
not successful for the corresponding given frame. 

22. The method of claim 1, further comprising the step of 
preprocessing the frames to normalize intensity and position 
of the frames. 

23. The method of claim 1, wherein said motion matrix is 
augmented with score matrices taken from bilinear models 
of supplementary data matrices for the same frames used to 
generate the motion matrix, said supplementary data matri- 
ces containing data from one of (a) motions for other holons, 
(b) intensity changes, (c) motions estimated in an earlier 
stage, and (d) motions estimated at another spatial resolu- 
tion. 

24. The method of claim 1, wherein the step of performing 
Principal Component Analysis comprises smoothing said 
motion loading vectors. 

25. The method of claim 1, wherein the step of performing 
Principal Component Analysis comprises smoothing said 
motion score vectors. 

26. The method of claim 1, wherein there is one motion 
matrix for each spatial dimension in said reference image, 
and each said component of each said motion vector is 
placed in the motion matrix that corresponds to said spatial 
dimension. 

27. The method of claim 1, further comprising the step of 
approximating a specific frame from the reference image by 
moving the reference image according to estimated motion. 

28. The method of claim 1, comprising the steps of: 
selecting a specific frame; 

estimating motion from the reference frame to the specific 
frame; and 

regressing the motion found in the estimating step on the 
motion loading vectors to produce entries for the 
motion score matrix corresponding to the selected 
specific frame. 

29. The method according to claim 1, further comprising 
the steps of: 

(1) selecting a specific frame; 

(2) initializing a set of motion score vectors to start 
values, the number of said motion score vectors being 
equal to the number of said motion loading vectors; 

(3) for each of a plurality of trial score combinations, 
computing a motion field by multiplying said trial 
combination of motion score vectors by said motion 
loading vector, moving a second image according to 
said motion field to produce a reconstruction, comput- 
ing a fidelity measurement according to the difference 
between said reconstruction and a first image, each trial 65 
score combination being computed as a perturbation of 
said motion scores; 
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(4) computing new motion scores dependent on said trial 
combination of motion score vectors and said fidelity 
measurement; and 

(5) repeating steps (3) and (4); 

wherein said motion load vectors and said motion score 
vectors computed by the last repetition of step (4) represent 
motion to be approximated for the selected specific frame. 

30. The method according to claim 1, wherein an intensity 
score matrix comprises a plurality of intensity score vectors 
arranged in columns and an intensity loading matrix com- 
prises intensity loading vectors arranged in rows, each 
intensity score vector element corresponding to one frame, 
each element of each intensity loading vector corresponding 
to one element of the reference image, one column of said 
intensity score matrix and one intensity loading vector 
together constitute a factor, the number of factors being less 
than or equal to the number of said frames, the sum of 
factors describing intensity changes in said reference image, 

the step of estimating motion comprising the substep of 
adjusting at least one of the reference image and a 
specific frame according to the intensity loading vec- 
tors. 

31. The method according to claim 30, further comprising 
the steps of: 

(1) predicting intensity score entries of the intensity score 
vectors for said specific frame by one of interpolating 
and extrapolating from entries of the intensity score 
vectors from related frames; 

(2) computing an intensity-corrected reference image 
from the product of intensity scores predicted in step 
(1) and the intensity loading vectors, plus said refer- 
ence image; and 

(3) estimating motion from said intensity-corrected ref- 
erence image to said specific frame, the estimated 
motion representing motion relative to said reference 
image for said specific frame. 

32. The method of claim 1, further comprising the steps 
of: 

(1) computing an intensity-corrected reference image as 
the product of the intensity change score vector entries 
for a specific frame and the intensity change loading 
vectors, plus said reference image; 

(2) estimating motion from said intensity-corrected ref- 
erence image to said specific frame; 

(3) projecting the motion estimated in step (2) on said 
motion loading vectors to produce motion scores for 
said specific frame; 

(4) computing a motion field by multiplying the motion 
score vector entries produced in step (3) by the motion 
loading vectors; 

(5) moving said specific frame backwards according to 
the motion field computed in step (4) to produce a 
motion compensated image; 

(6) calculating an intensity difference between the motion 
compensated image produced in step (5) and said 
reference image; 

(7) projecting the difference calculated in step (6) on the 
intensity change loading vectors to produce intensity 
change score vector entries for said specific frame; and 

(8) repeating steps (l)-(7) 0De or more times; 
wherein the specific frame is described relative to the 
reference image according to intensity change loading 
vectors, motion loading vectors, and initial intensity change 
score vector entries for the specific frame. 
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33. The method of claim 1, further comprising the steps 
of: 

(1) initializing an intensity change model comprising 
intensity change score vectors and intensity change 
loading vectors to empty; 

(2) initializing a motion model comprising motion score 
vectors and motion loading vectors to empty; 

(3) choosing an unprocessed frame; 

(4) if a non-empty intensity change model is available, 
predicting intensity change score vector entries for the 
unprocessed frame by one of interpolating and extrapo- 
lating to related frames, computing an intensity correc- 
tion by multiplying the predicted intensity change score 
vector entries for said unprocessed frame by the inten- 
sity change loading vectors, computing an intensity- 
corrected reference image by adding said intensity 
correction to said reference image, 

otherwise setting the intensity-corrected reference 
image to be equal to said reference image; 

(5) estimating motion from said intensity-corrected ref- 
erence image to said unprocessed frame; 

(6) updating said motion model according to the motion 
estimated in step (5); 

(7) computing a motion compensation field by multiply- 
ing motion score vector entries for said unprocessed 
frame by said motion loading vectors; 

(8) moving said unprocessed frame backwards according 
to the motion compensation field, thereby producing a 
motion-compensated image; 

(9) calculating the difference between said motion - 
compensated image and said reference image; 

(10) updating said intensity model according to the dif- 
ference calculated in step (9); 

(11) repeating steps (3)-(10) for each frame in said 
sequence; 

wherein the motion score vectors and motion loading vec- 
tors resulting from the last repetition of step (6) and the 
intensity change score vectors and intensity change loading 
vectors resulting from the last repetition of step (10) together 
represent the motion and intensity changes for the reference 
image relative to each frame in the sequence. 

34. The method of claim 33, wherein the step of intensity 
modeling includes calculating uncertainties, adjusting said 
intensity corrections according to said uncertainties by at 
least one of smoothing, multiplying by, or subtracting from 
said intensity correction in accordance with said uncertain- 
ties. 

35. The method of claim 33, wherein said intensity 
corrections are adjusted according to an intensity relaxation 
parameter which is a decreasing function of repetitions, a 
small intensity relaxation parameter resulting in a small 
intensity correction. 

36. The method of claim 33, wherein the motion modeling 
includes the steps of calculating uncertainties and smoothing 
said motion compensation field according to said uncertain- 
ties. 

37. The method of claim 33, wherein the motion is 
smoothed according to a motion relaxation parameter which 
is a decreasing function of repetitions, a small motion 
relaxation parameter resulting in little smoothing. 

38. The method of claim 37, wherein steps (3)-(ll) are 
repeated for several passes. 

39. The method to claim 38, wherein after the step of 
moving backwards, a Multiplicative Signal Correction is 
performed. 



10 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



40. The method according to claim 1, further comprising 
the steps of: 

(1) initializing an intensity change model consisting of 
intensity change score vectors and intensity change 
loading vectors to empty; 

(2) initializing a motion model consisting of motion score 
vectors and motion loading vectors to empty; 

(3) choosing an unprocessed frame; 

(4) if a non-empty motion model is available, predicting 
motion score vector entries by one of interpolating and 
extrapolating to related frames, computing a motion 
compensation field by multiplying the predicted motion 
score vector entries by the motion loading vectors, 
moving said unprocessed frame backwards using the 
motion compensation field thus producing a motion- 
compensated image, 

otherwise setting the motion-compensated image to be 
equal to said unprocessed frame; 

(5) calculating a difference between said motion- 
compensated image and said reference image; 

(6) updating said intensity model according to the differ- 
ence calculated in step (5); 

(7) computing an intensity correction by multiplying the 
intensity change score vector entries updated in step (6) 
corresponding to said frame by the intensity change 
loading vectors updated in step (6); 

(8) adding said intensity correction to said reference 
image to produce an intensity-corrected image; 

(9) estimating motion from said intensity-corrected image 
to said unprocessed frame; 

(10) updating the motion model with the motion estimated 
in step (9); 

(11) repeating steps (3)~{1Q); 

wherein the motion score vectors and motion loading vec- 
tors resulting from the last repetition of step (10) and the 
intensity score vectors and intensity loading vectors result- 
ing from the last repetition of step (6) together represent the 
motion and intensity changes for the reference image rela- 
tive to each frame in the sequence. 

41. The method according to claim 1, wherein said 
intensity model is initialized according to a set of chosen 
intensity patterns. 

42. An apparatus for estimating motion between one 
reference image and one or more frames in a sequence of 
two or more frames, each frame consisting of a plurality of 
pixels of an input image, said apparatus comprising: 

means for estimating motion from the reference image to 
two or more of the frames to produce motion fields; 

means for transforming the motion fields into a motion 
matrix, where each row of the motion matrix corre- 
sponds to one motion field and each row of the motion 
matrix contains a vertical and a horizontal component 
of a motion vector pertaining to each pixel of the 
reference image; and 

means for performing Principal Component Analysis on 
the motion matrix to produce a factorization of the 
motion matrix into the product of a motion score matrix 
and a motion loading matrix, the motion score matrix 
comprising a plurality of motion score vectors arranged 
as column vectors, the motion loading matrix compris- 
ing a plurality of motion loading vectors arranged as 
row vectors, the factorization forming a bilinear model 
in which one column of said motion score matrix and 
one corresponding row of said motion loading matrix 
together comprise a factor of the bilinear model, the 
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number of factors being lower than or equal to the 

number of said frames; 
wherein results from the Principal Component Analysis 

on the motion matrix are used to influence further 

estimation of motion from the reference image to one 

or more of the frames. 
43. A data structure stored in a computer readable medium 
and suitable for representing motion between one reference 
image and each frame in a sequence of frames, said frames 
consisting of a plurality of data samples arranged in a spatial 
pattern, said data structure comprising: 



,677 
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(1) a plurality of motion patterns represented as loading 
vectors, each element of each loading vector corre- 
sponding to one element of said reference image; and 

(2) a plurality of motion score vectors, each motion score 
vector corresponding to one of said frames, each 
motion score vector consisting of the same number of 
elements as the number of loading vectors, each motion 
score element of each motion score vector representing 
bow much the corresponding loading vector should 
contribute to the total motion for said one frame. 

> 

***** 
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